### Pandas Data structures

#### Pandas is often used in tandem with numerical computing tools like NumPy and SciPy, analytical libraries like statsmodels and scikit-learn, and data visualization libraries like matplotlib.

In [1]:
#Import pandas package
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

### Series
#### A Series is a one-dimensional array-like object containing a sequence of values (of similar types to NumPy types) and an associated array of data labels, called its index.

In [2]:
obj = pd.Series([4,7,-5,3])
obj.values # The dataset array representation object of Series by using .values
obj.index #Just like range in Numpy

RangeIndex(start=0, stop=4, step=1)

In [3]:
#Now, here we create a series with an index identifying each data point with label
from operator import index
obj2 = pd.Series([4,7,-5,3],index = ['d','b','c','a']) #here a label with string for each data point are added
obj2.index
obj2['a'] #This way you can access the corrsponding value of the series defined.

3

#### Using NumPy functions or NumPy-like operations, such as filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value

In [4]:
#Here we do a boolean and some math functions
obj2[obj2>0] # Performs a boolean operation to display values greater than 0
obj2*2 #Scalar multiplication
np.exp(obj2) #Exponential of obj2

d      54.598150
b    1096.633158
c       0.006738
a      20.085537
dtype: float64

#### Series can also be visualised as a fixed-length, ordered dict as it is mapping of index values to data values

In [5]:
'b' in obj2

True

In [6]:
#Creating a Series by passing the dict
sdata = {'Kokkedal':2980, 'Lyngby':2800, 'Ostebro': 2200,'Horsholm':2700}
obj3 = pd.Series(sdata) # Here you pass sdata dictionary into a pandas Series
obj3

Kokkedal    2980
Lyngby      2800
Ostebro     2200
Horsholm    2700
dtype: int64

In [9]:
location = ['Norreport','Kokkedal','Lyngby','Ostebro'] #Defining index
obj4 = pd.Series(sdata,index=location) #Passing index into a Series and matching it to dict already defined
obj4
#Checking for a NotAvailable/missing data
pd.isnull(obj4)

Norreport     True
Kokkedal     False
Lyngby       False
Ostebro      False
dtype: bool

#### A useful Series feature for many applications is that it automatically aligns by index label in arithmetic operations

In [10]:
# A scalar operation is performed with obj3 and obj4 it aligns with index defined
obj3+obj4

Horsholm        NaN
Kokkedal     5960.0
Lyngby       5600.0
Norreport       NaN
Ostebro      4400.0
dtype: float64

#### Both the Series object itself and its index have a name attribute, which integrates with other key areas of pandas functionality:

In [11]:
obj4.name = 'Pincode' # giving Series object a name
obj4.index.name = 'located' #Giving Series index a name
obj4

located
Norreport       NaN
Kokkedal     2980.0
Lyngby       2800.0
Ostebro      2200.0
Name: Pincode, dtype: float64

### Data Frame

#### A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.). The DataFrame has both a row and column index; it can be thought of as a dict of Series all sharing the same index

#### DataFrame is physically two-dimensional, you can use it to represent higher dimensional data in a tabular format using hier‐ archical indexing

In [13]:
#Constructing a Dataframe through a dict of equal-length lists or Numpy arrays
data = {'state':['AP','TG','TG','TG','KA'],
        'year': [1951,2011,2012,2014,1952],
        'pop':[6,3.85,4,4.2,3.2]
        } 
frame = pd.DataFrame(data) #Defining a dataframe from a dictionary
frame

Unnamed: 0,state,year,pop
0,AP,1951,6.0
1,TG,2011,3.85
2,TG,2012,4.0
3,TG,2014,4.2
4,KA,1952,3.2


In [14]:
pd.DataFrame(data,columns=['year','state','pop']) #Order of the data to be displayed

Unnamed: 0,year,state,pop
0,1951,AP,6.0
1,2011,TG,3.85
2,2012,TG,4.0
3,2014,TG,4.2
4,1952,KA,3.2


In [16]:
#If a column not in dict is passed, it shows up as a missing value
frame2 = pd.DataFrame(data,columns=['year','state','pop','debt'],index=['a','b','c','d','e'])
frame2

Unnamed: 0,year,state,pop,debt
a,1951,AP,6.0,
b,2011,TG,3.85,
c,2012,TG,4.0,
d,2014,TG,4.2,
e,1952,KA,3.2,
