# Jupyter Notebook

Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. It's a popular tool among data scientists, researchers, and developers for data analysis, machine learning, and prototyping.

# What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
# Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
# How to install Pandas?
You can install by using command "pip install pandas"


# Series
The Pandas Series can be defined as a one-dimensional array that is capable of storing various data types. We can easily convert the list, tuple, and dictionary into series using "series' method.

In [1]:
import pandas as pd
#create a series with no data
x = pd.Series()
print (x)

Series([], dtype: object)


# Creating a Series using inputs
We can create Series by using various inputs:
Array, Dict and Scalar value

In [2]:
#Creating Series from Array
import pandas as pd  
import numpy as np  
i = np.array(['P','a','n','d','a','s'])  
a = pd.Series(i)  
print(a)   

0    P
1    a
2    n
3    d
4    a
5    s
dtype: object


In [3]:
#Create a Series from dict
import pandas as pd  
info = {'x' : 0., 'y' : 1., 'z' : 2.}  
a = pd.Series(info)  
print (a)  

x    0.0
y    1.0
z    2.0
dtype: float64


In [4]:
#Create a Series using Scalar:
import pandas as pd   
x = pd.Series(4, index=[0, 1, 2, 3])  
print (x)

0    4
1    4
2    4
3    4
dtype: int64


# Series object attributes
Series.index    -- Defines the index of the Series. <br>
Series.shape	--It returns a tuple of shape of the data.<br>
Series.dtype	--It returns the data type of the data.<br>
Series.size	    --It returns the size of the data.<br>
Series.empty	--It returns True if Series object is empty, otherwise returns false.<br>
Series.ndim	    --It returns the number of dimensions in the data.<br>
Series.itemsize	--It returns the size of the datatype of item.<br>

In [4]:
import pandas as pd   
x=pd.Series(data=[2,4,6,8])   
y=pd.Series(data=[11.2,18.6,22.5], index=['a','b','c'])   
print(y.index)   
print(x.values)   
print(x.dtype)
print(x.size)
print(x.ndim)
print(x.shape)
print(x.empty)

Index(['a', 'b', 'c'], dtype='object')
[2 4 6 8]
int64
4
1
(4,)
False


In [6]:
#Accessing Elements of a Series and Slicing
import pandas as pd
x = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print (x[0])
print (x[:3])
print (x[-3:])
print(x[::-1])
print (x['a'])
print (x[['a','c','d']])

1
a    1
b    2
c    3
dtype: int64
c    3
d    4
e    5
dtype: int64
e    5
d    4
c    3
b    2
a    1
dtype: int64
1
a    1
c    3
d    4
dtype: int64


In [7]:
# Viewing/Inspecting Data
# df.head(n): It returns first n rows of the DataFrame.
# df.tail(n): It returns last n rows of the DataFrame.

import pandas as pd
x = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print (x.head(2))
print (x.tail(4))

a    1
b    2
dtype: int64
b    2
c    3
d    4
e    5
dtype: int64


# What is DataFrame?
Pandas DataFrame is a widely used data structure which works with a two-dimensional array.DataFrames are similar to SQL tables or the spreadsheets that you work with in Excel.
# Create a DataFrame
We can create a DataFrame using following ways:<br>
dict<br>
Lists<br>
Numpy ndarrrays<br>
Series

In [3]:
#Create an empty DataFrame

import pandas as pd
df = pd.DataFrame()
print("Empty DataFrame")
print (df)

#Create a DataFrame from Lists
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print("DataFrame from Lists")
print (df)

#Create a DataFrame from Dict of ndarrays / Lists
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print("DataFrame from Dict")
print (df)

#Create a DataFrame from List of Dicts
import pandas as pd
data = [{'a': [1,2], 'b': [3,4]},{'a': [5,4], 'b': [6,4], 'c': [7,4]}]
df = pd.DataFrame(data)
print("DataFrame from List of Dicts")
print (df)

#Creation of DataFrame from NumPy ndarrays
import pandas as pd
import numpy as np
array1 = np.array([10,20, 30])
array2 = np.array([100,200, 300])
array3 = np.array([-10,-20,-30, -40])
df = pd.DataFrame([array1,array2,array3])
print("DataFrame from NumPy ndarrays")
print (df)


Empty DataFrame
Empty DataFrame
Columns: []
Index: []
DataFrame from Lists
   0
0  1
1  2
2  3
3  4
4  5
DataFrame from Dict
    Name Age
0    Tom  28
1   Jack  34
2  Steve  29
3  Ricky  42
DataFrame from List of Dicts
        a       b       c
0  [1, 2]  [3, 4]     NaN
1  [5, 4]  [6, 4]  [7, 4]
DataFrame from NumPy ndarrays
     0    1    2     3
0   10   20   30   NaN
1  100  200  300   NaN
2  -10  -20  -30 -40.0


In [18]:
#Column Selection and slicing
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print (df['Name'])

#Column Addition
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
df['Address'] = ['Delhi','Bangalore','Chennai','Pune']
print (df)
print ("Add new column using existing DataFrame columns")
df['Key']=df['Name']+df['Address']  
print (df)
#deleting Comlumns
print ("Delete the key column:")  
del df['Key']  
print (df)  


0      Tom
1     Jack
2    Steve
3    Ricky
Name: Name, dtype: object
    Name  Age    Address
0    Tom   28      Delhi
1   Jack   34  Bangalore
2  Steve   29    Chennai
3  Ricky   42       Pune
Add new column using existing DataFrame columns
    Name  Age    Address            Key
0    Tom   28      Delhi       TomDelhi
1   Jack   34  Bangalore  JackBangalore
2  Steve   29    Chennai   SteveChennai
3  Ricky   42       Pune      RickyPune
Delete the key column:
    Name  Age    Address
0    Tom   28      Delhi
1   Jack   34  Bangalore
2  Steve   29    Chennai
3  Ricky   42       Pune


In [8]:
#Row Selection:
#Rows can be selected by passing row label to a loc function.
#Rows can be selected by passing integer location to an iloc function.

import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)
print (df.loc[0]) #selection of index
print (df.loc[0:2])
print (df.iloc[2]) #selection of position


    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42
Name    Tom
Age      28
Name: 0, dtype: object
    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
Name    Steve
Age        29
Name: 2, dtype: object


In [42]:
#Row Addition
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df1 = pd.DataFrame(data)
df2 = pd.DataFrame([['Ajay', 23], ['Rahul', 24]], columns = ['Name', 'Age'])
df1 = pd.concat([df1,df2])
df1 = df1.reset_index(drop=True)
print (df1)

    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42
4   Ajay   23
5  Rahul   24


In [41]:
#Row Deletion
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df1 = pd.DataFrame(data)
print(df1)
df1 = df1.drop([0,1])
print (df1)

    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42
    Name  Age
2  Steve   29
3  Ricky   42


# The statistics functions 
Can be applied to a Series, which are as follows:

df.describe(): It returns the summary statistics for the numerical columns.<br>
The describe() method is used for calculating some statistical data like percentile, mean and std of the numerical values of the Series or DataFrame.<br>
df.mean() : It returns the mean of all the columns.<br>
The mean() function is used to return the mean of the values for the requested axis.<br>
df.count(): It returns the count of all the non-null values in each dataframe column.<br>
count() is defined as a method that is used to count the number of non-NA cells for each column or row.<br>
df.max(): It returns the highest value from each of the columns.<br>
df.min(): It returns the lowest value from each of the columns.<br>
df.median(): It returns the median from each of the columns.<br>
df.std(): It returns the standard deviation from each of the columns.<br>

In [40]:
#use of describe() function
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s.describe())

count     4
unique    4
top       a
freq      1
dtype: object


In [43]:
#use of mean() function
import numpy as np
data = np.array([1,2,3,4,5])
s = pd.Series(data)
print (s.mean())


3.0


In [44]:
#use of count() function
import numpy as np
data = np.array([1,2,3,4,5])
s = pd.Series(data)
print (s.count())


5


In [45]:
#use of min and max function
import numpy as np
data = np.array([1,2,3,4,5])
s = pd.Series(data)
print (s.min())
print (s.max())

1
5


In [46]:
#use of median() function
import numpy as np
data = np.array([1,2,3,4,5])
s = pd.Series(data)
print (s.median())


3.0


In [47]:
#use of std() function
import numpy as np
data = np.array([1,2,3,4,5])
s = pd.Series(data)
print (s.std())


1.5811388300841898


# Functions for Importing Data
pd.read_csv(filename) : It read the data from CSV file.<br>
pd.read_table(filename) : It is used to read the data from delimited text file.<br>
pd.read_excel(filename) : It read the data from an Excel file.<br>
pd.read_sql(query,connection _object) : It read the data from a SQL table/database.<br>

In [1]:
import pandas  
df = pandas.read_csv('company_sales_data.csv')  
print(df)  

    month_number  facecream  facewash  toothpaste  bathingsoap  shampoo  \
0              1       2500      1500        5200         9200     1200   
1              2       2630      1200        5100         6100     2100   
2              3       2140      1340        4550         9550     3550   
3              4       3400      1130        5870         8870     1870   
4              5       3600      1740        4560         7760     1560   
5              6       2760      1555        4890         7490     1890   
6              7       2980      1120        4780         8980     1780   
7              8       3700      1400        5860         9960     2860   
8              9       3540      1780        6100         8100     2100   
9             10       1990      1890        8300        10300     2300   
10            11       2340      2100        7300        13300     2400   
11            12       2900      1760        7400        14400     1800   

    moisturizer  total_u

# Functions for Exporting data
df.to_csv(filename): It writes to a CSV file.<br>
df.to_excel(filename): It writes to an Excel file.<br>
df.to_sql(table_name, connection_object): It writes to a SQL table.<br>

In [2]:
import pandas as pd  
data = {'Name': ['Smith', 'Parker'], 'ID': [101, 102], 'Language': ['Python', 'JavaScript']}  
info = pd.DataFrame(data)  
print('DataFrame Values:\n', info)  
# default CSV  
csv_data = info.to_csv()  
print('\nCSV String Values:\n', csv_data) 

DataFrame Values:
      Name   ID    Language
0   Smith  101      Python
1  Parker  102  JavaScript

CSV String Values:
 ,Name,ID,Language
0,Smith,101,Python
1,Parker,102,JavaScript

