#### Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. 
It provides two primary data structures: Series and DataFrame. 

* A Series is a one-dimensional array-like object.
* DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In data frame each colimn is a series.

In [1]:
import pandas as pd

In [19]:
## Series : A Series is a one-dimensional array-like object that can hold any data type. It is similar to a column in a table.

## Syntax : 

import pandas as pd                                 ## Here index are autopopulated (0,1,2,3,4)
data = [1,2,3,4,5]                                  ## So series structure = (Values + Index)
series=pd.Series(data)                              ## We can customise index or it will autopopulate.
print("Series \n",series)
print(type(series))


Series 
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [20]:
## Series with customised index:

data=[4,5,6,7,8]                                ## Here Indexes 01234 is replaced by abcde. 
index=['a','b','c','d','e']
series=pd.Series(data,index)
print("Series \n",series)
print(type(series))

Series 
 a    4
b    5
c    6
d    7
e    8
dtype: int64
<class 'pandas.core.series.Series'>


In [None]:
## A series from dictionary:        (datatype = int)
data={'a':1,'b':2,'c':3}
series_dict=pd.Series(data)
print(series_dict)
print(type(series_dict))

a    1
b    2
c    3
dtype: int64
<class 'pandas.core.series.Series'>


In [None]:
## A series from dictionary:        (datatype=object)
dict={'name':'Enid','age':23,'city':'Nagpur'}        
ser_dict=pd.Series(dict)                        
print(ser_dict)                 ## Here datatype = object because there are different datatypes (int and str) in place of values.
print(type(ser_dict))

name      Enid
age         23
city    Nagpur
dtype: object
<class 'pandas.core.series.Series'>


In [None]:
## DataFrame : A 2-dimensional labeled data structure (like a table or Excel sheet).
            ## Each column is a Series.
            ## Can store different data types in different columns.



## create a Dataframe from a dictionary of list                     ## lists under 1 Dictionary
data={
    'Name':['Krish','John','Jack'],
    'Age':[25,30,45],
    'City':['Bangalore','New York','Florida']
}
df=pd.DataFrame(data)
print(df)
print(type(df))

    Name  Age       City
0  Krish   25  Bangalore
1   John   30   New York
2   Jack   45    Florida
<class 'pandas.core.frame.DataFrame'>


In [25]:
## Create a Data frame From a List of Dictionaries                 ## Dictionaries under 1 list
data=[
    {'Name':'Krish','Age':32,'City':'Bangalore'},
    {'Name':'John','Age':34,'City':'Bangalore'},
    {'Name':'Bappy','Age':32,'City':'Bangalore'},
    {'Name':'JAck','Age':32,'City':'Bangalore'}
    
]
df=pd.DataFrame(data)
print(df)
print(type(df))

    Name  Age       City
0  Krish   32  Bangalore
1   John   34  Bangalore
2  Bappy   32  Bangalore
3   JAck   32  Bangalore
<class 'pandas.core.frame.DataFrame'>


In [None]:
## Assessing data from DataFrame

## 1. Assesing columns: df['Name'] - Single column → Series
##                      df[['Name','Age']] - Multiple columns → DataFrame

## 2. Assessing Rows: By position (iloc): df.iloc[0] - First row
##                                        df.iloc[0:2] - First two rows         

##                   By label/index (loc):df.loc[0] - First row (index 0)
##                                        df.loc[0:2] - Rows with index 0,1,2  


## Useful attributes
#       df.columns    # List of column names
#       df.index      # Row index
#       df.shape      # (rows, columns)
#       df.head(3)    # First 3 rows
#       df.tail(2)    # Last 2 rows

                        

In [44]:
data={
    'Name':['Krish','John','Jack'],
    'Age':[25,30,45],
    'City':['Bangalore','New York','Florida']
}
df=pd.DataFrame(data)
df


Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Florida


In [41]:
## Assessing columns-can directly call the column.
df['Name']                                                  ### Name column 
df[['Name','City']]                                         ## Name and City columns   

Unnamed: 0,Name,City
0,Krish,Bangalore
1,John,New York
2,Jack,Florida


In [None]:
## Assessing rows by label\index that is by using loc:
## Syntax: df.loc[row_label, column_label]

df.loc[0]                       ## First Row
df.loc[2]                       ## Third Row
df.loc[:,'City']                ## Cities of all Rows
df.loc[0:1,'Name']              ## Names from 1st & 2nd Rows


0    Krish
1     John
Name: Name, dtype: object

In [64]:
## Assessing rows by Position that is by using iloc:
## Syntax: df.iloc[row_position, column_position]

df.iloc[0]              ## 1st Row (Same as loc output)
df.iloc[1]              ## 2nd row (Same as loc output)
df.iloc[:,1]            ## All rows and 1st column data
df.iloc[0,2]

'Bangalore'

In [None]:
## at and iat are like faster, more precise versions of loc and iloc when you want a single value.

## at – label-based, single value
## Access a single cell using row and column labels. (Access special element)
## Syntax : df.at[row_label, column_label]

df.at[1,'Age']              #30
df.at[0,'Name']             #Krish

'Krish'

In [None]:
## iat – integer position-based, single value
## Access a single cell using row and column positions.
## Syntax : df.iat[row_position, column_position]

df.iat[2,0]                 #Jack
df.iat[2,2]                 #Florida
df.iat[1,2]                 #NewYork

'New York'

## Data manipulation in DataFrame:
## Adding a column:df=['New_col_name]=[elements of new col]
## Editing existing col (Eg. age+1): df['age']=df['age']+1
## Deleting a col : Drop (temporary delete), Permanent Delete : use iplace=true

In [None]:
df

Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Florida


In [72]:
## Adding a column
df['Salary']=[50000,60000,70000]
df

Unnamed: 0,Name,Age,City,Salary
0,Krish,25,Bangalore,50000
1,John,30,New York,60000
2,Jack,45,Florida,70000


In [82]:
## Removing a column (temporarily)
df.drop('Salary',axis=1)


Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Florida


In [None]:
df   ## Above we deleted salary col but still it exists.

Unnamed: 0,Name,Age,City,Salary
0,Krish,25,Bangalore,50000
1,John,30,New York,60000
2,Jack,45,Florida,70000


In [None]:
## To remove permanantly
df.drop('Salary',axis=1,inplace=True)           ##Now its removed permanently
df

Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Florida


In [88]:
df.drop(0,inplace=True)    ##Droping col using position
df

Unnamed: 0,Name,Age,City
1,John,31,New York
2,Jack,46,Florida


In [89]:
## Add age to the column
df['Age']=df['Age']+1
df                                  # Age col = +1 hua last values se

Unnamed: 0,Name,Age,City
1,John,32,New York
2,Jack,47,Florida
