# Pandas


## The Pandas Series Object
A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or array as follows.


In [15]:
import pandas as pd
import numpy as np

data = pd.Series([0.25,0.5,0.75,1])
print(data)

# We see that in the preceeding output, the Series wraps both a sequence of values and a 
# sequence of indices , they can be accessed with values and index attributes 

print(F"Values = {data.values}")
print(F"indices = {data.index}")

# The index is an array like object of type pd.Index 

# like Numpy array, data can be accessed by the associated index via square braket notation

print(F"\n 2nd element = {data[1]}, return type = {type(data[1])}")
print(F"\n slicing \n{data[1:3]}, \n return type = {type(data[1:3])}")

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64
Values = [0.25 0.5  0.75 1.  ]
indices = RangeIndex(start=0, stop=4, step=1)

 2nd element = 0.5, return type = <class 'numpy.float64'>

 slicing 
1    0.50
2    0.75
dtype: float64, 
 return type = <class 'pandas.core.series.Series'>


In [2]:
# The explicit index defination gives the Series Object additional capabilites 
# For example , the index need not be integer but can be any desired type 

data = pd.Series([0.25,0.5,0.75,1],index=['a','b','c','d'])
print(data)

# Accessing with index works
print(F"\n data['b'] = {data['b']}")

# We can even use noncontiguous or nonsequential indices:

data = pd.Series([0.25, 0.5, 0.75, 1.0],index=[2, 5, 3, 7])
print("\n")
print(data)

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

 data['b'] = 0.5


2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64


In [3]:
# Series as specialized dictionary 
# A dictionary is a structure that maps arbitrary keys to arbitrary values 
# A Series object maps typed keys to typed values 

# We can create a Series object from dictionary to get clear analogy 
population_dict = { 'California': 38332521,
                    'Texas': 26448193,
                    'New York': 19651127,
                    'Florida': 19552860,
                    'Illinois': 12882135}
population = pd.Series(population_dict)
print(population)


# We can perform dictionary styled operation
print(F"\n dictionary styled operation {population['California']}")

# The Series also supports array-styled operations like slicing
print(F"\n Array styled slicing operation = {population['California':'Florida']}")   # end is inclusive here

# Other ways of creating Series object

# data can be a scalar, which is repeated to fill the specified index

print(F"\n  Series created from scalar \n {pd.Series(5, index=[100, 200, 300])}")

# When a dictionary is passed Keys will be dictionary keys by defualt, but if we want to keep only selected keys, we can pass a lndex list
#In this case Series is populated with specified indices only

print(F"\n passing indices when passing dictionary \n {pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])}") 




California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

 dictionary styled operation 38332521

 Array styled slicing operation = California    38332521
Texas         26448193
New York      19651127
Florida       19552860
dtype: int64

  Series created from scalar 
 100    5
200    5
300    5
dtype: int64

 passing indices when passing dictionary 
 3    c
2    a
dtype: object

 Creating DataFrame from a single Series object
                   0
California  38332521
Texas       26448193
New York    19651127
Florida     19552860
Illinois    12882135


## DataFrames
A dataframe is analogous to a two-dimensional array with flexible row indices and flexible column names.

Just as we can think two-dimensional array as an ordered sequence of aligned one-dimenional columns, we can think of a DataFrame as a sequnce of aligned Series objects. Here, by aligned we mean they share the **same index**.   

In [16]:
# To demonstrate this, lets create a dataframe by combining two Series objects

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
'Florida': 170312, 'Illinois': 149995}

area = pd.Series(area_dict)

print(F"\n Area = ")
print(area)

# We can use a dictionary to construct a single two dimensional object containing this information

states = pd.DataFrame({'population': population, 'area': area})
print("\n DataFrame created by merging two Series Objects with common indices")
print(states)

# Like Series object DataFrame also has an index attribute that gives access to index labels 

print("\n Index column of DataFrame")
print(states.index)

# Additionally the DataFrame has a columns attribute which is an Index object building the column names.

print("\n Index object holding the names of columns")
print(states.columns)

# Thus the dataframe can be thought as a generalization of a two-dimensional NumPy array where 
# both rows and columns have generalized index for accessing the data

# DataFrames as specialized dictionaries 
# We can think of DataFrames as special dictionaries where a DataFrame maps a column name 
# to a Series of column data

print("\n DataFrames as dictionaries where column name acts as key and column data acts as values")
print(states['area'])

# Creating DataFrame from a single Series object

print(F"\n Creating DataFrame from a single Series object")
print(pd.DataFrame(population, columns=['population']))  # we have to specify the column name , otherwise it would be 0 , 1, 2

# Creating DataFrame from a list of dicts 

print(F"\n Creating DataFrame from a list of dicts")
data = [{'a' : i + 1, 'b' : 2 * i + 1} for i in range(3)]
print(pd.DataFrame(data))

# Even if some keys in the dictionary are missing, Pandas will fill them in with NaN (i.e.,

print("\n Pandas fills missing values with NaN")
print(pd.DataFrame([{'a' : 1, 'b' : 2}, {'b': 3, 'c':4}]))

# Creating DataFrame from a 2-dimensional numpy array

print(F"\n Creating DataFrame from a 2-dimensional numpy array")
print(pd.DataFrame(np.random.rand(3, 2),columns=['foo', 'bar'],index=['a', 'b', 'c']))



 Area = 
California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
dtype: int64

 DataFrame created by merging two Series Objects with common indices
            population    area
California    38332521  423967
Texas         26448193  695662
New York      19651127  141297
Florida       19552860  170312
Illinois      12882135  149995

 Index column of DataFrame
Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')

 Index object holding the names of columns
Index(['population', 'area'], dtype='object')

 DataFrames as dictionaries where column name acts as key and column data acts as values
California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64

 Creating DataFrame from a single Series object
            population
California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135

 Creatin

In [21]:
# The Index Object

# We have seen both Series and DataFrame have explicit index that let's us reference and modify data.
# Index is an immutable array , an ordered multiset - because Index can have duplicate values 

# Let's construct Index from a list of integers 

print("Constructing  Index from a list of integers ")
ind = pd.Index([2,3,5,7,8])
print(ind)

# Index is an immutable array but we can perform operations like slicing, accessing elements through []
print("\n Accessing from []")
print(ind[1])

print("\n Slicing")
print(ind[::-1])

print("\n Index also has some attributes luke numpy arrays")
print(F"size = {ind.size}, shape = {ind.shape}, dim = {ind.ndim}, dtype = {ind.dtype}")


# Index as ordered multiset

# Pandas objects are designed to facilitate operations such as joins across datasets,
# which depend on many aspects of set arithmetic. The Index object follows many of
# the conventions used by Python’s built-in set data structure, so that unions, intersec‐
# tions, differences, and other combinations can be computed in a familiar way:

indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

print(F"\n indA = {indA},\n indB = {indB}")
print(F"\n union = {indA | indB}")
print(F"\n Intersection = {indA & indB}")
print(F"\n symmetric difference = {indA ^ indB}")



Constructing  Index from a list of integers 
Int64Index([2, 3, 5, 7, 8], dtype='int64')

 Accessing from []
3

 Slicing
Int64Index([8, 7, 5, 3, 2], dtype='int64')

 Index also has some attributes loke numpy arrays
size = 5, shape = (5,), dim = 1, dtype = int64

 indA = Int64Index([1, 3, 5, 7, 9], dtype='int64'), indB = Int64Index([2, 3, 5, 7, 11], dtype='int64')

 union = Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

 Intersection = Int64Index([3, 5, 7], dtype='int64')

 symmetric difference = Int64Index([1, 2, 9, 11], dtype='int64')
