## 1. What is Pandas & DataFrame

- Pandas is library for python which is used to handle data and is build on top of numpy so it is naturaly fast. It is widely used in data analysis, visualization, data science and machine learning.
- Pandas has two type of objects *series* and *dataframe*.
  - *series* is Pandas 1-Dimensional labeled array that can hold any data type. Think of it like a single column in a spreadsheet (1 -Dimensional)
  - *dataframe* is tabular data structure with rows AND columns. Similar to an Excel spreadsheet (2 Dimensional)

In [1]:
import pandas as pd

### - Series

In [2]:
data = [10,20,30,40]

series = pd.Series(data)
print(series)

0    10
1    20
2    30
3    40
dtype: int64


In [3]:
calories = { "Day 1": 1750, "Day 2" : 2100, "Day 3" : 1700}

series = pd.Series(calories)
print(series)

Day 1    1750
Day 2    2100
Day 3    1700
dtype: int64


#### Index of Series

In [4]:
data = [10,20,30,40]

series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)

a    10
b    20
c    30
d    40
dtype: int64


#### loc and iloc in Pandas

In [5]:
series.loc["b"] = 100
print(series.loc["b"])

100


In [6]:
print(series.iloc[0])
print(series.iloc[0:2])

10
a     10
b    100
dtype: int64


#### Filtering in Pandas

In [7]:
print(series[series >= 40])

b    100
d     40
dtype: int64


### - DataFrame

In [8]:
data = { "Name" : ["Spongboob", "Patrick", "Squidward"],
         "Age" : [30, 35, 55] }

df = pd.DataFrame(data, index=["E1", "E2", "E3"])
print(df)

         Name  Age
E1  Spongboob   30
E2    Patrick   35
E3  Squidward   55


#### Adding New Column to DataFrame

In [9]:
df["Job"] = ["Cook", "N/A", "Cashier"]
print(df)

         Name  Age      Job
E1  Spongboob   30     Cook
E2    Patrick   35      N/A
E3  Squidward   55  Cashier


#### Adding New Rows(Data) to DataFrame

In [10]:
new_row = pd.DataFrame([{"Name" : "Sandy", "Age" : 28, "Job" : "Manager"}], index=["E4"])

df = pd.concat([df, new_row])

print(df)

         Name  Age      Job
E1  Spongboob   30     Cook
E2    Patrick   35      N/A
E3  Squidward   55  Cashier
E4      Sandy   28  Manager


In [11]:
new_rows = pd.DataFrame([{"Name" : "Suman", "Age" : 48, "Job" : "Manager"},
                        {"Name" : "Sam", "Age" : 33, "Job" : "Engineer"}], 
                       index=["E5", "E6"])

df = pd.concat([df, new_rows])

print(df)

         Name  Age       Job
E1  Spongboob   30      Cook
E2    Patrick   35       N/A
E3  Squidward   55   Cashier
E4      Sandy   28   Manager
E5      Suman   48   Manager
E6        Sam   33  Engineer


## 2. Reading Data

In [12]:
df = pd.read_csv("pokemon_data.csv")
print(df)

      No        Name    Type1   Type2  Height  Weight  Legendary
0      1   Bulbasaur    Grass  Poison     0.7     6.9          0
1      2     Ivysaur    Grass  Poison     1.0    13.0          0
2      3    Venusaur    Grass  Poison     2.0   100.0          0
3      4  Charmander     Fire     NaN     0.6     8.5          0
4      5  Charmeleon     Fire     NaN     1.1    19.0          0
..   ...         ...      ...     ...     ...     ...        ...
145  146     Moltres     Fire  Flying     2.0    60.0          1
146  147     Dratini   Dragon     NaN     1.8     3.3          0
147  148   Dragonair   Dragon     NaN     4.0    16.5          0
148  149   Dragonite   Dragon  Flying     2.2   210.0          0
149  150      Mewtwo  Psychic     NaN     2.0   122.0          1

[150 rows x 7 columns]


In [13]:
print(df.head())

   No        Name  Type1   Type2  Height  Weight  Legendary
0   1   Bulbasaur  Grass  Poison     0.7     6.9          0
1   2     Ivysaur  Grass  Poison     1.0    13.0          0
2   3    Venusaur  Grass  Poison     2.0   100.0          0
3   4  Charmander   Fire     NaN     0.6     8.5          0
4   5  Charmeleon   Fire     NaN     1.1    19.0          0


In [14]:
print(df.tail())

      No       Name    Type1   Type2  Height  Weight  Legendary
145  146    Moltres     Fire  Flying     2.0    60.0          1
146  147    Dratini   Dragon     NaN     1.8     3.3          0
147  148  Dragonair   Dragon     NaN     4.0    16.5          0
148  149  Dragonite   Dragon  Flying     2.2   210.0          0
149  150     Mewtwo  Psychic     NaN     2.0   122.0          1


In [15]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   No         150 non-null    int64  
 1   Name       150 non-null    object 
 2   Type1      150 non-null    object 
 3   Type2      67 non-null     object 
 4   Height     150 non-null    float64
 5   Weight     150 non-null    float64
 6   Legendary  150 non-null    int64  
dtypes: float64(2), int64(2), object(3)
memory usage: 8.3+ KB
None


In [16]:
print(df.describe())

               No      Height      Weight   Legendary
count  150.000000  150.000000  150.000000  150.000000
mean    75.500000    1.200000   46.231333    0.026667
std     43.445368    0.963634   59.547388    0.161647
min      1.000000    0.200000    0.100000    0.000000
25%     38.250000    0.700000    9.925000    0.000000
50%     75.500000    1.000000   30.000000    0.000000
75%    112.750000    1.500000   56.375000    0.000000
max    150.000000    8.800000  460.000000    1.000000


#### Selection By Column

In [17]:
print(df["Name"]) #print(df["Name"].to_string()) - prints all the rows

0       Bulbasaur
1         Ivysaur
2        Venusaur
3      Charmander
4      Charmeleon
          ...    
145       Moltres
146       Dratini
147     Dragonair
148     Dragonite
149        Mewtwo
Name: Name, Length: 150, dtype: object


In [18]:
print(df["Height"])

0      0.7
1      1.0
2      2.0
3      0.6
4      1.1
      ... 
145    2.0
146    1.8
147    4.0
148    2.2
149    2.0
Name: Height, Length: 150, dtype: float64


In [19]:
print(df[["Name", "Height", "Weight"]])

           Name  Height  Weight
0     Bulbasaur     0.7     6.9
1       Ivysaur     1.0    13.0
2      Venusaur     2.0   100.0
3    Charmander     0.6     8.5
4    Charmeleon     1.1    19.0
..          ...     ...     ...
145     Moltres     2.0    60.0
146     Dratini     1.8     3.3
147   Dragonair     4.0    16.5
148   Dragonite     2.2   210.0
149      Mewtwo     2.0   122.0

[150 rows x 3 columns]


#### Selection By Row

In [20]:
print(df.loc[df["Name"] == "Mewtwo"])

      No    Name    Type1 Type2  Height  Weight  Legendary
149  150  Mewtwo  Psychic   NaN     2.0   122.0          1


In [21]:
print(df.loc[0:11]) #conditions can we applied

    No        Name  Type1   Type2  Height  Weight  Legendary
0    1   Bulbasaur  Grass  Poison     0.7     6.9          0
1    2     Ivysaur  Grass  Poison     1.0    13.0          0
2    3    Venusaur  Grass  Poison     2.0   100.0          0
3    4  Charmander   Fire     NaN     0.6     8.5          0
4    5  Charmeleon   Fire     NaN     1.1    19.0          0
5    6   Charizard   Fire  Flying     1.7    90.5          0
6    7    Squirtle  Water     NaN     0.5     9.0          0
7    8   Wartortle  Water     NaN     1.0    22.5          0
8    9   Blastoise  Water     NaN     1.6    85.5          0
9   10    Caterpie    Bug     NaN     0.3     2.9          0
10  11     Metapod    Bug     NaN     0.7     9.9          0
11  12  Butterfree    Bug  Flying     1.1    32.0          0


In [22]:
print(df.iloc[0:11, 0:3]) #only indexs can be passed

    No        Name  Type1
0    1   Bulbasaur  Grass
1    2     Ivysaur  Grass
2    3    Venusaur  Grass
3    4  Charmander   Fire
4    5  Charmeleon   Fire
5    6   Charizard   Fire
6    7    Squirtle  Water
7    8   Wartortle  Water
8    9   Blastoise  Water
9   10    Caterpie    Bug
10  11     Metapod    Bug


#### Filtering

In [23]:
tall_pokemon = df[df["Height"] >= 2]
print(tall_pokemon)

      No        Name    Type1    Type2  Height  Weight  Legendary
2      3    Venusaur    Grass   Poison     2.0   100.0          0
22    23       Ekans   Poison      NaN     2.0     6.9          0
23    24       Arbok   Poison      NaN     3.5    65.0          0
94    95        Onix     Rock   Ground     8.8   210.0          0
102  103   Exeggutor    Grass  Psychic     2.0   120.0          0
114  115  Kangaskhan   Normal      NaN     2.2    80.0          0
129  130    Gyarados    Water   Flying     6.5   235.0          0
130  131      Lapras    Water      Ice     2.5   220.0          0
142  143     Snorlax   Normal      NaN     2.1   460.0          0
145  146     Moltres     Fire   Flying     2.0    60.0          1
147  148   Dragonair   Dragon      NaN     4.0    16.5          0
148  149   Dragonite   Dragon   Flying     2.2   210.0          0
149  150      Mewtwo  Psychic      NaN     2.0   122.0          1


In [24]:
heavy_pokemon = df[df["Weight"] >= 100]
print(heavy_pokemon)

      No       Name     Type1    Type2  Height  Weight  Legendary
2      3   Venusaur     Grass   Poison     2.0   100.0          0
58    59   Arcanine      Fire      NaN     1.9   155.0          0
67    68    Machamp  Fighting      NaN     1.6   130.0          0
74    75   Graveler      Rock   Ground     1.0   105.0          0
75    76      Golem      Rock   Ground     1.4   300.0          0
86    87    Dewgong     Water      Ice     1.7   120.0          0
90    91   Cloyster     Water      Ice     1.5   132.5          0
94    95       Onix      Rock   Ground     8.8   210.0          0
102  103  Exeggutor     Grass  Psychic     2.0   120.0          0
110  111    Rhyhorn    Ground     Rock     1.0   115.0          0
111  112     Rhydon    Ground     Rock     1.9   120.0          0
129  130   Gyarados     Water   Flying     6.5   235.0          0
130  131     Lapras     Water      Ice     2.5   220.0          0
142  143    Snorlax    Normal      NaN     2.1   460.0          0
148  149  

In [25]:
legendary_pokemon = df[df["Legendary"] == 1]
print(legendary_pokemon)

      No      Name     Type1   Type2  Height  Weight  Legendary
143  144  Articuno       Ice  Flying     1.7    55.4          1
144  145    Zapdos  Electric  Flying     1.6    52.6          1
145  146   Moltres      Fire  Flying     2.0    60.0          1
149  150    Mewtwo   Psychic     NaN     2.0   122.0          1


In [26]:
tall_and_legendary_pokemon = df[(df["Height"] >= 2) & (df["Legendary"] == 1)]
print(tall_and_legendary_pokemon)

      No     Name    Type1   Type2  Height  Weight  Legendary
145  146  Moltres     Fire  Flying     2.0    60.0          1
149  150   Mewtwo  Psychic     NaN     2.0   122.0          1


#### Aggrigate Function

In [27]:
print(df.mean(numeric_only=True))

No           75.500000
Height        1.200000
Weight       46.231333
Legendary     0.026667
dtype: float64


In [28]:
print(df.sum(numeric_only=True))

No           11325.0
Height         180.0
Weight        6934.7
Legendary        4.0
dtype: float64


In [29]:
print(df.count())

No           150
Name         150
Type1        150
Type2         67
Height       150
Weight       150
Legendary    150
dtype: int64


In [30]:
print(df.min(numeric_only=True))

No           1.0
Height       0.2
Weight       0.1
Legendary    0.0
dtype: float64


In [31]:
print(df.max(numeric_only=True))

No           150.0
Height         8.8
Weight       460.0
Legendary      1.0
dtype: float64


In [32]:
print(df["No"].mean(numeric_only=True))

75.5


#### Grouping

In [33]:
df = pd.read_csv("pokemon_data.csv")

In [34]:
group = df.groupby("Type1")

In [35]:
print(group)
print(group["Height"])

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000204604BB810>
<pandas.core.groupby.generic.SeriesGroupBy object at 0x00000204604ED850>


In [36]:
print(group["Height"].mean())

Type1
Bug         0.900000
Dragon      2.666667
Electric    0.855556
Fairy       0.950000
Fighting    1.185714
Fire        1.216667
Ghost       1.466667
Grass       1.083333
Ground      0.850000
Ice         1.550000
Normal      0.986364
Poison      1.221429
Psychic     1.371429
Rock        1.844444
Water       1.300000
Name: Height, dtype: float64


In [37]:
print(group["Height"].count())

Type1
Bug         12
Dragon       3
Electric     9
Fairy        2
Fighting     7
Fire        12
Ghost        3
Grass       12
Ground       8
Ice          2
Normal      22
Poison      14
Psychic      7
Rock         9
Water       28
Name: Height, dtype: int64
