### Pandas-DataFrame and Series

Pandas is a powerful Data manipulation library in Python, Widely used for data analysis and data cleaning. It provides two primary data structures: **Series and Dataframe**. A **series** is a **one dimensional array** like object, while a **dataframe** is a **two dimensional**, size mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [1]:
import pandas as pd

### 1. <u>Series</u>:

In [None]:
# A pandas is a one dimensional array-like object that can hold any data type. It is similar to a column in a table.

data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)
print(type(series))

0    10
1    20
2    30
3    40
4    50
dtype: int64
<class 'pandas.core.series.Series'>


### Another way to create a series using "dictionary elements"

In [None]:
data = {'a': 10, 'b': 20, 'c': 30}
series_dict = pd.Series(data)
print(series_dict)
print(type(series_dict))

a    10
b    20
c    30
dtype: int64
<class 'pandas.core.series.Series'>


### You can give custom index to the data

In [None]:
data = [100,200,300]
index = ['1', '2', '3']
series_custom_index = pd.Series(data, index=index)
print(series_custom_index)
print(type(series_custom_index))

1    100
2    200
3    300
dtype: int64
<class 'pandas.core.series.Series'>


### 2. <u>DataFrame</u> :

In [13]:
# Dataframe can have multiple n-dimensional rows and columns
# create a dataframe from a dictionary of list

data = {
    'Name':['Ashuto', 'Krish', 'Kalyan'],
    'Age':[25, 30, 19],
    'City':['New York', 'Banglore', 'Hyderabad']
}

df = pd.DataFrame(data)
print(df)
print(type(df))

     Name  Age       City
0  Ashuto   25   New York
1   Krish   30   Banglore
2  Kalyan   19  Hyderabad
<class 'pandas.core.frame.DataFrame'>


In [14]:
# create dataframe from a list of dictionaries

data = [
    {'Name':'Ashuto', 'Age':25, 'City':'New York'},
    {'Name':'Krish', 'Age':30, 'City':'Banglore'},
    {'Name':'Kalyan', 'Age':35, 'City':'Hyderabad'}
]

df2 = pd.DataFrame(data)
df2

Unnamed: 0,Name,Age,City
0,Ashuto,25,New York
1,Krish,30,Banglore
2,Kalyan,35,Hyderabad


In [19]:
df = pd.read_csv('dataset.csv')
df.head(5)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


In [20]:
df.tail(3)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
997,998,Step Up 2: The Streets,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699,58.01,50.0
998,999,Search Party,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881,,22.0
999,1000,Nine Lives,"Comedy,Family,Fantasy",A stuffy businessman finds himself trapped ins...,Barry Sonnenfeld,"Kevin Spacey, Jennifer Garner, Robbie Amell,Ch...",2016,87,5.3,12435,19.64,11.0


### Accessing data from dataframe

In [21]:
df2

Unnamed: 0,Name,Age,City
0,Ashuto,25,New York
1,Krish,30,Banglore
2,Kalyan,35,Hyderabad


In [25]:
print(df2['Name']) # accessing name
print(type(df2['Name']))

0    Ashuto
1     Krish
2    Kalyan
Name: Name, dtype: object
<class 'pandas.core.series.Series'>


In [None]:
df2.iloc[0] # accessing first row with index

Name      Ashuto
Age           25
City    New York
Name: 0, dtype: object

In [29]:
# you can do the same with loc as well
df2.loc[0]

Name      Ashuto
Age           25
City    New York
Name: 0, dtype: object

#### Using 'at' and 'iat' function

In [41]:
df2

Unnamed: 0,Name,Age,City
0,Ashuto,25,New York
1,Krish,30,Banglore
2,Kalyan,35,Hyderabad


In [None]:
# Accessing a specified element
df2.at[2,'Age']

np.int64(35)

In [44]:
df2.at[2,'Name']

'Kalyan'

In [45]:
# accessing using iat

df2.iat[2,2]

'Hyderabad'

#### Data addition and deletion in Pandas

In [46]:
df2['Salary'] = [50000, 60000, 55555000]
df2

Unnamed: 0,Name,Age,City,Salary
0,Ashuto,25,New York,50000
1,Krish,30,Banglore,60000
2,Kalyan,35,Hyderabad,55555000


In [47]:
# deleting a column
df2.drop('Salary', axis=1)

Unnamed: 0,Name,Age,City
0,Ashuto,25,New York
1,Krish,30,Banglore
2,Kalyan,35,Hyderabad


In [48]:
df2

Unnamed: 0,Name,Age,City,Salary
0,Ashuto,25,New York,50000
1,Krish,30,Banglore,60000
2,Kalyan,35,Hyderabad,55555000


In [49]:
# inorder to make sure that the column is deleted from the dataframe permanently, we need to use inplace=True parameter
df2.drop('Salary', axis=1, inplace=True)

In [50]:
df2

Unnamed: 0,Name,Age,City
0,Ashuto,25,New York
1,Krish,30,Banglore
2,Kalyan,35,Hyderabad


In [52]:
# add age to the column

df2['Age'] + [5, 10, 15]

0    30
1    40
2    50
Name: Age, dtype: int64

#### Getting Statistical and Concise Summary:

In [None]:
df.describe() # statistical summary of csv file dataframe

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
count,1000.0,1000.0,1000.0,1000.0,1000.0,872.0,936.0
mean,500.5,2012.783,113.172,6.7232,169808.3,82.956376,58.985043
std,288.819436,3.205962,18.810908,0.945429,188762.6,103.25354,17.194757
min,1.0,2006.0,66.0,1.9,61.0,0.0,11.0
25%,250.75,2010.0,100.0,6.2,36309.0,13.27,47.0
50%,500.5,2014.0,111.0,6.8,110799.0,47.985,59.5
75%,750.25,2016.0,123.0,7.4,239909.8,113.715,72.0
max,1000.0,2016.0,191.0,9.0,1791916.0,936.63,100.0


In [None]:
df.info() # concise summary of the csv file dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Title               1000 non-null   object 
 2   Genre               1000 non-null   object 
 3   Description         1000 non-null   object 
 4   Director            1000 non-null   object 
 5   Actors              1000 non-null   object 
 6   Year                1000 non-null   int64  
 7   Runtime (Minutes)   1000 non-null   int64  
 8   Rating              1000 non-null   float64
 9   Votes               1000 non-null   int64  
 10  Revenue (Millions)  872 non-null    float64
 11  Metascore           936 non-null    float64
dtypes: float64(3), int64(4), object(5)
memory usage: 93.9+ KB
