## Introduction to Pandas
### What is Pandas ?
#### Pandas is a Python Library that is basically used for data manipulation, data analysis and data cleaning. It makes working with "labelled" or "relational" data much easier,efficient and intuitive

### Importing Pandas

In [1]:
import pandas as pd

### Data Structures in Pandas
#### Pandas has mainly 2 types of data structures for storing data: Series and Dataframe

### Pandas Series
#### A Pandas Series is one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects etc.). The axis labels are collectively called indexes. Series is created by loading the datasets from existing storage which can be a SQL database, a CSV file or an Excel fil


In [2]:
data=["Kathmandu","Pokhara","Lalitpur","Dhangadhi"]
data

['Kathmandu', 'Pokhara', 'Lalitpur', 'Dhangadhi']

In [3]:
s=pd.Series(data)
s

0    Kathmandu
1      Pokhara
2     Lalitpur
3    Dhangadhi
dtype: object

In [4]:
data=["Saurav","Lalipur","Bachelor","Engineering"]
data

['Saurav', 'Lalipur', 'Bachelor', 'Engineering']

In [5]:
s=pd.Series(data)
s

0         Saurav
1        Lalipur
2       Bachelor
3    Engineering
dtype: object

In [6]:
s[0]

'Saurav'

In [7]:
s[3]

'Engineering'

In [8]:
s.values

array(['Saurav', 'Lalipur', 'Bachelor', 'Engineering'], dtype=object)

In [9]:
s.index

RangeIndex(start=0, stop=4, step=1)

In [10]:
s.dtype

dtype('O')

### Pandas Dataframe
#### DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is like a spreadsheet or SQL table, or a dictionary of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:

In [11]:
data={
    "Name":["Saurav","Shyam","Gita","Hari","Ram"],
    "Age":[20,17,14,24,28],
    "City":["Lalipur","Pokhara","Kailali","Kathmandu","Bhaktapur"],
    "Degree":["Bachelor","Isc","SEE","Masters","Phd"],
    "Subject":["Engineering","Science","Mathematics","Physics","Sociology"]
}
data

{'Name': ['Saurav', 'Shyam', 'Gita', 'Hari', 'Ram'],
 'Age': [20, 17, 14, 24, 28],
 'City': ['Lalipur', 'Pokhara', 'Kailali', 'Kathmandu', 'Bhaktapur'],
 'Degree': ['Bachelor', 'Isc', 'SEE', 'Masters', 'Phd'],
 'Subject': ['Engineering', 'Science', 'Mathematics', 'Physics', 'Sociology']}

In [12]:
df=pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City,Degree,Subject
0,Saurav,20,Lalipur,Bachelor,Engineering
1,Shyam,17,Pokhara,Isc,Science
2,Gita,14,Kailali,SEE,Mathematics
3,Hari,24,Kathmandu,Masters,Physics
4,Ram,28,Bhaktapur,Phd,Sociology


In [13]:
df.index

RangeIndex(start=0, stop=5, step=1)

In [14]:
df.shape

(5, 5)

In [15]:
df.columns

Index(['Name', 'Age', 'City', 'Degree', 'Subject'], dtype='object')

In [16]:
df['Name'] #Returns a Series

0    Saurav
1     Shyam
2      Gita
3      Hari
4       Ram
Name: Name, dtype: object

In [17]:
df[['Name','City']] #Returns a Dataframe

Unnamed: 0,Name,City
0,Saurav,Lalipur
1,Shyam,Pokhara
2,Gita,Kailali
3,Hari,Kathmandu
4,Ram,Bhaktapur


In [18]:
df.head(2)

Unnamed: 0,Name,Age,City,Degree,Subject
0,Saurav,20,Lalipur,Bachelor,Engineering
1,Shyam,17,Pokhara,Isc,Science


In [19]:
df.tail()

Unnamed: 0,Name,Age,City,Degree,Subject
0,Saurav,20,Lalipur,Bachelor,Engineering
1,Shyam,17,Pokhara,Isc,Science
2,Gita,14,Kailali,SEE,Mathematics
3,Hari,24,Kathmandu,Masters,Physics
4,Ram,28,Bhaktapur,Phd,Sociology


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Name     5 non-null      object
 1   Age      5 non-null      int64 
 2   City     5 non-null      object
 3   Degree   5 non-null      object
 4   Subject  5 non-null      object
dtypes: int64(1), object(4)
memory usage: 332.0+ bytes


In [21]:
df.describe()

Unnamed: 0,Age
count,5.0
mean,20.6
std,5.549775
min,14.0
25%,17.0
50%,20.0
75%,24.0
max,28.0


In [22]:
df['Age'].sum()

np.int64(103)

### Basic Data Selection & Filtering

In [23]:
df[df['Age']>=20]

Unnamed: 0,Name,Age,City,Degree,Subject
0,Saurav,20,Lalipur,Bachelor,Engineering
3,Hari,24,Kathmandu,Masters,Physics
4,Ram,28,Bhaktapur,Phd,Sociology


In [24]:
df[(df['Age']==20) & (df['Name']=="Saurav")]

Unnamed: 0,Name,Age,City,Degree,Subject
0,Saurav,20,Lalipur,Bachelor,Engineering


### Handling Missing Data

In [25]:
df.isnull()

Unnamed: 0,Name,Age,City,Degree,Subject
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,False


In [26]:
df.notnull()

Unnamed: 0,Name,Age,City,Degree,Subject
0,True,True,True,True,True
1,True,True,True,True,True
2,True,True,True,True,True
3,True,True,True,True,True
4,True,True,True,True,True


In [27]:
df.isna()

Unnamed: 0,Name,Age,City,Degree,Subject
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,False


### Adding / Modifying Columns

In [28]:
df['Age']=[20,21,22,23,24]
df

Unnamed: 0,Name,Age,City,Degree,Subject
0,Saurav,20,Lalipur,Bachelor,Engineering
1,Shyam,21,Pokhara,Isc,Science
2,Gita,22,Kailali,SEE,Mathematics
3,Hari,23,Kathmandu,Masters,Physics
4,Ram,24,Bhaktapur,Phd,Sociology


In [29]:
df['Age']=df['Age']*1.2
df

Unnamed: 0,Name,Age,City,Degree,Subject
0,Saurav,24.0,Lalipur,Bachelor,Engineering
1,Shyam,25.2,Pokhara,Isc,Science
2,Gita,26.4,Kailali,SEE,Mathematics
3,Hari,27.6,Kathmandu,Masters,Physics
4,Ram,28.8,Bhaktapur,Phd,Sociology


### Basic Aggregation & Grouping

In [30]:
df['Age'].min()

np.float64(24.0)

In [31]:
df['Age'].max()

np.float64(28.799999999999997)

In [32]:
df['Age'].mean()

np.float64(26.4)

In [33]:
df['Age'].ge(25)

0    False
1     True
2     True
3     True
4     True
Name: Age, dtype: bool

In [34]:
df[df['Age'].ge(25)]

Unnamed: 0,Name,Age,City,Degree,Subject
1,Shyam,25.2,Pokhara,Isc,Science
2,Gita,26.4,Kailali,SEE,Mathematics
3,Hari,27.6,Kathmandu,Masters,Physics
4,Ram,28.8,Bhaktapur,Phd,Sociology


### Reading and Writing Data

In [36]:
df

Unnamed: 0,Name,Age,City,Degree,Subject
0,Saurav,24.0,Lalipur,Bachelor,Engineering
1,Shyam,25.2,Pokhara,Isc,Science
2,Gita,26.4,Kailali,SEE,Mathematics
3,Hari,27.6,Kathmandu,Masters,Physics
4,Ram,28.8,Bhaktapur,Phd,Sociology


In [44]:
df.to_csv("../data/data.csv",index=False)
file=pd.read_csv("../data/data.csv")
file

Unnamed: 0,Name,Age,City,Degree,Subject
0,Saurav,24.0,Lalipur,Bachelor,Engineering
1,Shyam,25.2,Pokhara,Isc,Science
2,Gita,26.4,Kailali,SEE,Mathematics
3,Hari,27.6,Kathmandu,Masters,Physics
4,Ram,28.8,Bhaktapur,Phd,Sociology


In [45]:
file['Name']

0    Saurav
1     Shyam
2      Gita
3      Hari
4       Ram
Name: Name, dtype: object

In [49]:
file['Age'].eq(24)

0     True
1    False
2    False
3    False
4    False
Name: Age, dtype: bool