# Pandas - Introduction

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

### Why Pandas ?

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

### Where is the Pandas Codebase?

The source code for Pandas is located at this github repository https://github.com/pandas-dev/pandas


## Installing Pandas Library

In [2]:
pip install pandas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## importing pandas

In [3]:
import pandas as pd

In [4]:
person = {
    "first": ['Ajul', 'Vivek', 'Allen'],
    "last": ['Thomas', 'Thomas', 'Mathews'],
    'email': ['ajt@gmail.com', 'vit@gmail.com', 'ajm@gmail.com']
}

people_df = pd.DataFrame(person)

In [5]:
people_df

Unnamed: 0,first,last,email
0,Ajul,Thomas,ajt@gmail.com
1,Vivek,Thomas,vit@gmail.com
2,Allen,Mathews,ajm@gmail.com


In [6]:
people_df.shape

(3, 3)

In [7]:
people_df.size

9

In [8]:

people_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   first   3 non-null      object
 1   last    3 non-null      object
 2   email   3 non-null      object
dtypes: object(3)
memory usage: 200.0+ bytes


In [9]:
people_df['email']

0    ajt@gmail.com
1    vit@gmail.com
2    ajm@gmail.com
Name: email, dtype: object

In [10]:
type(people_df['email'])

pandas.core.series.Series

In [11]:
# dot notation is also valid, but bracket notatio is safer due to certain reasons

people_df.email

0    ajt@gmail.com
1    vit@gmail.com
2    ajm@gmail.com
Name: email, dtype: object

In [12]:
# retrieving mutiple columns from a dataframe at once
# filtering out columns

people_df[['first','email']]

Unnamed: 0,first,email
0,Ajul,ajt@gmail.com
1,Vivek,vit@gmail.com
2,Allen,ajm@gmail.com


In [13]:
# retrieve the list of column names

people_df.columns

Index(['first', 'last', 'email'], dtype='object')

In [14]:
# retrive a specific row

people_df.iloc[0]

first             Ajul
last            Thomas
email    ajt@gmail.com
Name: 0, dtype: object

In [15]:
# retrieve a list of rows from data frame

people_df.iloc[[0,1]]

Unnamed: 0,first,last,email
0,Ajul,Thomas,ajt@gmail.com
1,Vivek,Thomas,vit@gmail.com


In [16]:
# retrive data from df with specific rows and cols

people_df.iloc[[0,2], 2]

0    ajt@gmail.com
2    ajm@gmail.com
Name: email, dtype: object

In [17]:
people_df.iloc[[0,2], [0,1]]

Unnamed: 0,first,last
0,Ajul,Thomas
2,Allen,Mathews


In [18]:
# retrieving data from df using loc

people_df.loc[0]

first             Ajul
last            Thomas
email    ajt@gmail.com
Name: 0, dtype: object

In [19]:
people_df.loc[[0,1],['first', 'last']]

Unnamed: 0,first,last
0,Ajul,Thomas
1,Vivek,Thomas


In [22]:
# accessing data from pandas DataFrame

# bracket notation (preffered method)

print(people_df[['first', 'last']])



   first     last
0   Ajul   Thomas
1  Vivek   Thomas
2  Allen  Mathews
