#### Pandas - DataFrame and Series
Pandas is a powerful data manipulation library in Python widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame.
A Series is a one-dimensional array-like object , while a DataFrame is a two_dimensional , size-mutable , and potentially heterogenous tabular data structure with labeled axes (rows and columns)

In [4]:
# Series
import pandas as pd
data = [1 , 2 , 3 , 4 , 5]
series = pd.Series(data)
print("Series\n",series)
print(type(series))

Series
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [5]:
## Create a Series from dictionary elements
data = {'a' : 1 , 'b' : 2 , 'c' : 3}
series_dict = pd.Series(data)
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [6]:
data = [10 , 20 , 30]
index = ['a' , 'b' , 'c']
pd.Series(data , index = index)

a    10
b    20
c    30
dtype: int64

In [8]:
## DataFrame
## create a DataFrame from a dictionary of list

data = {
    'Name': ['Krish' , 'John' , 'Jack'] ,
    'Age' :[25 , 30 , 45] ,
    'City':['Bangalore' , 'New York' , 'Florida']
}
df = pd.DataFrame(data)
print(df)
print(type(df))

    Name  Age       City
0  Krish   25  Bangalore
1   John   30   New York
2   Jack   45    Florida
<class 'pandas.core.frame.DataFrame'>


In [9]:
## Create a dataFrame from a list of Dictionaries

data = [
    {'Name' : 'Krish' , 'Age': 32, 'City': 'Bangalore'} ,
    {'Name' : 'John' , 'Age': 34, 'City': 'Bangalore'} ,
    {'Name' : 'Bappy' , 'Age': 25, 'City': 'Bangalore'} ,
    {'Name' : 'Jack' , 'Age': 30, 'City': 'Bangalore'}
]
df = pd.DataFrame(data)
print(df)
print(type(df))

    Name  Age       City
0  Krish   32  Bangalore
1   John   34  Bangalore
2  Bappy   25  Bangalore
3   Jack   30  Bangalore
<class 'pandas.core.frame.DataFrame'>


In [12]:
df = pd.read_csv('basic-data.csv')
df.head(5)

Unnamed: 0,ID,Name,Age,Country,Email
0,1,Name_1,62,Country_1,email_1@example.com
1,2,Name_2,48,Country_2,email_2@example.com
2,3,Name_3,61,Country_3,email_3@example.com
3,4,Name_4,32,Country_4,email_4@example.com
4,5,Name_5,69,Country_5,email_5@example.com


In [13]:
df.tail(5)

Unnamed: 0,ID,Name,Age,Country,Email
95,96,Name_96,60,Country_6,email_96@example.com
96,97,Name_97,26,Country_7,email_97@example.com
97,98,Name_98,52,Country_8,email_98@example.com
98,99,Name_99,24,Country_9,email_99@example.com
99,100,Name_100,55,Country_0,email_100@example.com


In [14]:
## Accessing data from dataFrame
df

Unnamed: 0,ID,Name,Age,Country,Email
0,1,Name_1,62,Country_1,email_1@example.com
1,2,Name_2,48,Country_2,email_2@example.com
2,3,Name_3,61,Country_3,email_3@example.com
3,4,Name_4,32,Country_4,email_4@example.com
4,5,Name_5,69,Country_5,email_5@example.com
...,...,...,...,...,...
95,96,Name_96,60,Country_6,email_96@example.com
96,97,Name_97,26,Country_7,email_97@example.com
97,98,Name_98,52,Country_8,email_98@example.com
98,99,Name_99,24,Country_9,email_99@example.com


In [15]:
df['Name']

0       Name_1
1       Name_2
2       Name_3
3       Name_4
4       Name_5
        ...   
95     Name_96
96     Name_97
97     Name_98
98     Name_99
99    Name_100
Name: Name, Length: 100, dtype: object

In [16]:
data

[{'Name': 'Krish', 'Age': 32, 'City': 'Bangalore'},
 {'Name': 'John', 'Age': 34, 'City': 'Bangalore'},
 {'Name': 'Bappy', 'Age': 25, 'City': 'Bangalore'},
 {'Name': 'Jack', 'Age': 30, 'City': 'Bangalore'}]

In [17]:
type(df['Name'])

pandas.core.series.Series

In [23]:
df.loc[0][1]

  df.loc[0][1]


'Name_1'

In [24]:
df.iloc[0][2]

  df.iloc[0][2]


np.int64(62)

In [25]:
## Accessing a specified element
df['Name']

0       Name_1
1       Name_2
2       Name_3
3       Name_4
4       Name_5
        ...   
95     Name_96
96     Name_97
97     Name_98
98     Name_99
99    Name_100
Name: Name, Length: 100, dtype: object

In [27]:
df.at[1 , 'Age']

np.int64(48)

In [28]:
df.at[1 , 'Name']

'Name_2'

In [29]:
## Accessing a secified element using iat
df.iat[2,2]

np.int64(61)

In [43]:
## Data manipluation with DataFrame
import numpy as np
numbers = np.arange(100,200).tolist()
df["Salary"] = numbers
df

Unnamed: 0,ID,Name,Age,Country,Email,Salary
0,1,Name_1,62,Country_1,email_1@example.com,100
1,2,Name_2,48,Country_2,email_2@example.com,101
2,3,Name_3,61,Country_3,email_3@example.com,102
3,4,Name_4,32,Country_4,email_4@example.com,103
4,5,Name_5,69,Country_5,email_5@example.com,104
...,...,...,...,...,...,...
95,96,Name_96,60,Country_6,email_96@example.com,195
96,97,Name_97,26,Country_7,email_97@example.com,196
97,98,Name_98,52,Country_8,email_98@example.com,197
98,99,Name_99,24,Country_9,email_99@example.com,198


In [44]:
## Remove a column
df.drop('Salary',axis = 1,inplace = True)


In [45]:
df

Unnamed: 0,ID,Name,Age,Country,Email
0,1,Name_1,62,Country_1,email_1@example.com
1,2,Name_2,48,Country_2,email_2@example.com
2,3,Name_3,61,Country_3,email_3@example.com
3,4,Name_4,32,Country_4,email_4@example.com
4,5,Name_5,69,Country_5,email_5@example.com
...,...,...,...,...,...
95,96,Name_96,60,Country_6,email_96@example.com
96,97,Name_97,26,Country_7,email_97@example.com
97,98,Name_98,52,Country_8,email_98@example.com
98,99,Name_99,24,Country_9,email_99@example.com


In [46]:
df['Age'] = df['Age'] + 1
df

Unnamed: 0,ID,Name,Age,Country,Email
0,1,Name_1,63,Country_1,email_1@example.com
1,2,Name_2,49,Country_2,email_2@example.com
2,3,Name_3,62,Country_3,email_3@example.com
3,4,Name_4,33,Country_4,email_4@example.com
4,5,Name_5,70,Country_5,email_5@example.com
...,...,...,...,...,...
95,96,Name_96,61,Country_6,email_96@example.com
96,97,Name_97,27,Country_7,email_97@example.com
97,98,Name_98,53,Country_8,email_98@example.com
98,99,Name_99,25,Country_9,email_99@example.com


In [47]:
df.drop(0)

Unnamed: 0,ID,Name,Age,Country,Email
1,2,Name_2,49,Country_2,email_2@example.com
2,3,Name_3,62,Country_3,email_3@example.com
3,4,Name_4,33,Country_4,email_4@example.com
4,5,Name_5,70,Country_5,email_5@example.com
5,6,Name_6,33,Country_6,email_6@example.com
...,...,...,...,...,...
95,96,Name_96,61,Country_6,email_96@example.com
96,97,Name_97,27,Country_7,email_97@example.com
97,98,Name_98,53,Country_8,email_98@example.com
98,99,Name_99,25,Country_9,email_99@example.com


In [56]:
## Display the data types of each column
print("Data Types\n" , df.dtypes)

## Describe the DataFrame
print("\n\nStatistical summary\n" , df.describe())

Data Types
 ID          int64
Name       object
Age         int64
Country    object
Email      object
dtype: object


Statistical summary
                ID         Age
count  100.000000  100.000000
mean    50.500000   45.530000
std     29.011492   15.190012
min      1.000000   19.000000
25%     25.750000   33.000000
50%     50.500000   44.500000
75%     75.250000   60.250000
max    100.000000   70.000000


In [57]:
df.describe()

Unnamed: 0,ID,Age
count,100.0,100.0
mean,50.5,45.53
std,29.011492,15.190012
min,1.0,19.0
25%,25.75,33.0
50%,50.5,44.5
75%,75.25,60.25
max,100.0,70.0
