Pandas library contains two data structures : Series & DataFrame

Series is a one-dimensional array-like object containing a sequence of values and an associated array of data labels, called it's index.
The simplest series is formed from only an array of data.

In [1]:
import pandas as pd
list = [4, 5, -9, 3, 8]
obj = pd.Series(list)
obj

0    4
1    5
2   -9
3    3
4    8
dtype: int64

In [2]:
obj.values

array([ 4,  5, -9,  3,  8])

In [3]:
obj.index

RangeIndex(start=0, stop=5, step=1)

In [4]:
# it starts to give ValueError if length of values and index doesn't matches.

list = [4, 5, -9, 3, 8]
obj2 = pd.Series(list, index = ['a', 'b', 'c', 'd', 'e'])
obj2

a    4
b    5
c   -9
d    3
e    8
dtype: int64

In [5]:
obj2.values

array([ 4,  5, -9,  3,  8])

In [6]:
obj2.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [7]:
# Assessing values of obj using it's custom index, can also access multiple items by passing the list of indices
obj2['a']

np.int64(4)

In [8]:
obj2["c"]

np.int64(-9)

In [9]:
obj2[["a", "d", "c"]]

a    4
d    3
c   -9
dtype: int64

In [10]:
obj2[obj2 > 2]

a    4
b    5
d    3
e    8
dtype: int64

In [11]:
 obj2 * 2

a     8
b    10
c   -18
d     6
e    16
dtype: int64

In [12]:
#e raised to the power of values in obj2, where e is equal to 2.718.
import numpy as np
np.exp(obj2)

a      54.598150
b     148.413159
c       0.000123
d      20.085537
e    2980.957987
dtype: float64

In [13]:
"f" in obj2

False

In [14]:
"a" in obj2

True

In [15]:
# Time to pass a python dictionary inside Series method instead of lists for dataset

sdata = {
    "Delhi" : 25000, 
    "Mumbai" : 54000, 
    "Kolkata" : 45000,
    "Bangalore" : 65000
}

obj3 = pd.Series(sdata)
obj3

Delhi        25000
Mumbai       54000
Kolkata      45000
Bangalore    65000
dtype: int64

In [16]:
indexList = ["Delhi", "Mumbai", "Bihar", "Gujarat", "Bangalore"]
obj4 = pd.Series(sdata, index = indexList)
obj4
# if you remove the NaN value indexes then dtype will change back to int64

Delhi        25000.0
Mumbai       54000.0
Bihar            NaN
Gujarat          NaN
Bangalore    65000.0
dtype: float64

In [17]:
# Series and pandas both have isnull() and notnull() instance methods
obj4.isnull()

Delhi        False
Mumbai       False
Bihar         True
Gujarat       True
Bangalore    False
dtype: bool

In [18]:
obj4.notnull()

Delhi         True
Mumbai        True
Bihar        False
Gujarat      False
Bangalore     True
dtype: bool

In [19]:
pd.isnull(obj4)

Delhi        False
Mumbai       False
Bihar         True
Gujarat       True
Bangalore    False
dtype: bool

In [20]:
obj3 + obj4
# only adds those where same dict keys are present otherwise it will return NaN

Bangalore    130000.0
Bihar             NaN
Delhi         50000.0
Gujarat           NaN
Kolkata           NaN
Mumbai       108000.0
dtype: float64

In [21]:
obj4

Delhi        25000.0
Mumbai       54000.0
Bihar            NaN
Gujarat          NaN
Bangalore    65000.0
dtype: float64

In [22]:
obj3

Delhi        25000
Mumbai       54000
Kolkata      45000
Bangalore    65000
dtype: int64

In [23]:
obj3.name ="Number of luxury Cars"
obj3.index.name = "States of India"
obj3

States of India
Delhi        25000
Mumbai       54000
Kolkata      45000
Bangalore    65000
Name: Number of luxury Cars, dtype: int64

In [24]:
obj3.index 

Index(['Delhi', 'Mumbai', 'Kolkata', 'Bangalore'], dtype='object', name='States of India')

DataFrame : A DataFrame is a two- dimesional labelled data struture commonly used in data analysis and manipulation. 
It's similar to a table in a database or an excel spreadsheet.

PYQ 3

In [25]:
data = {
    "Items" : ["Yogurt", "Chips", "Soda", "Yogurt", "Cake", "Chips", "Yogurt"],
    "Sugar Type" : ["Low Fat", "Regular", "Low Fat", "High Fat", "Regular", "Low Fat", "Regular"],
    "Price" : [45, 30, 50, 70, 140 ,40, 50]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Items,Sugar Type,Price
0,Yogurt,Low Fat,45
1,Chips,Regular,30
2,Soda,Low Fat,50
3,Yogurt,High Fat,70
4,Cake,Regular,140
5,Chips,Low Fat,40
6,Yogurt,Regular,50


In [32]:
df[df["Sugar Type"] == "Low Fat"]["Price"].mean()

np.float64(45.0)

In [30]:
avg_price_low_fat = df[df['Sugar Type'] == 'Low Fat']['Price'].mean()
print(avg_price_low_fat)


45.0


In [34]:
unique_items = df["Items"].unique()
unique_items

array(['Yogurt', 'Chips', 'Soda', 'Cake'], dtype=object)

In [43]:
import pandas as pd
s1 = pd.Series(["Certificate", "Bachelor", "Master", "Doctorate"], index= [2, 4, 6, 8])
s1.reindex(range(10), method ="ffill")
print(s1)


2    Certificate
4       Bachelor
6         Master
8      Doctorate
dtype: object
