### **Putting Some Pandas In Your Python**

Pandas in Python is a package that is written for data analysis and manipulation. Pandas offer various operations and data structures to perform numerical data manipulations and time series. Pandas is an open-source library that is built over Numpy libraries. Pandas library is known for its high productivity and high performance. Pandas is popular because it makes importing and analyzing data much easier.


Pandas programs can be written on any plain text editor like notepad, notepad++, or anything of that sort and saved with a .py extension. To begin with, writing Pandas Codes and performing various intriguing and useful operations, one must have Python installed on their System. This can be done by following the step by step instructions provided below

**Installing Pandas**

In [None]:
! pip install pandas

import pandas as pd
import numpy as np



## **Pandas Series Data Structure**


Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series is nothing but a column in an excel sheet. Labels need not be unique but must be a hashable type



In [None]:
# pd.Series(data,index)
# index-> Unique, Hashable, same length as data. By default np.arange(n)

s = pd.Series([1, 2, 3, 4])

print(s)

0    1
1    2
2    3
3    4
dtype: int64


## **Creating Series from Numpy ndarray**

Creating a series from array without index.

In this case as no index is passed, so by default index will be range(n) where n is array length.


In [None]:

data = np.array([10, 20, 30, 40, 50])

s = pd.Series(data)

print(s)

0    10
1    20
2    30
3    40
4    50
dtype: int64


## **Data accessing using Index**

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

Pandas Indexing using [ ], .loc[], .iloc[ ], .ix[ ]


In [None]:
s = pd.Series([1, 2, 3, 4, 5])

print(s[2])

print(s[1:])

print(s[[1, 4]])

3
1    2
2    3
3    4
4    5
dtype: int64
1    2
4    5
dtype: int64


In [None]:
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])

print(s)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [None]:
# Retrieve multiple elements

print(s[['a', 'b', 'e']])

a    1
b    2
e    5
dtype: int64


# **Pandas DataFrame**
### **Creating DataFrame using Dictionary**

Pandas DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is generally the most commonly used pandas object. 
Pandas DataFrame can be created in multiple ways. Let’s discuss different ways to create a DataFrame one by one.

In [None]:
data = {'Name':['Maneesha', 'Anaparthi', 'Rgukt', 'cse'], 'Count':[10,10,1,360]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Count
0,Maneesha,10
1,Anaparthi,10
2,Rgukt,1
3,cse,360


In [None]:
#creating indexed dataframe
data = {'Name':['Maneesha', 'Anaparthi', 'Rgukt', 'cse'], 'Count':[10,10,1,360]}
df = pd.DataFrame(data,index = ['index-1','index-2','index-3','index-4'])
df

Unnamed: 0,Name,Count
index-1,Maneesha,10
index-2,Anaparthi,10
index-3,Rgukt,1
index-4,cse,360


In [None]:
#creating indexed nan dataframe
data = {'Name':['Maneesha', 'Anaparthi', np.nan, 'cse'], 'Count':[10,np.nan,1,360]}
df = pd.DataFrame(data,index = ['index-1','index-2','index-3','index-4'])
df

Unnamed: 0,Name,Count
index-1,Maneesha,10.0
index-2,Anaparthi,
index-3,,1.0
index-4,cse,360.0


## **dataset.info()**

The info() function is used to print a concise summary of a DataFrame. This method prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage. Whether to print the full summary. By default, the setting in pandas

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, index-1 to index-4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    3 non-null      object 
 1   Count   3 non-null      float64
dtypes: float64(1), object(1)
memory usage: 256.0+ bytes


## **Creating DataFrame using Tuple**

To convert a Python tuple to DataFrame, use the list of tuples and pass that list to a pd. DataFrame() constructor, and it will return a DataFrame. Pandas DataFrame is a two-dimensional, size-mutable, heterogeneous tabular data structure that contains rows and columns

In [None]:
data = [('1/7/2021', 13, 6, 'Rain'),
       ('2/7/2021', 11, 7, 'Fog'),
       ('3/7/2021', 12, 8, 'Sunny'),
       ('4/7/2021', 8, 5, 'Snow'),
       ('5/7/2021', 9, 6, 'Rain')]
df = pd.DataFrame(data,
                  columns=['Day', 'Temperature', 'WindSpeed', 'Event'])

df

Unnamed: 0,Day,Temperature,WindSpeed,Event
0,1/7/2021,13,6,Rain
1,2/7/2021,11,7,Fog
2,3/7/2021,12,8,Sunny
3,4/7/2021,8,5,Snow
4,5/7/2021,9,6,Rain


### **DataFrame Basic Functionality**

A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). It has to be remembered that unlike Python lists, a Series will always contain data of the same type.

In [None]:
# Create Dictionary of Series
dict = {'Name':pd.Series(['A', 'B', 'C', 'D', 'E', 'F', 'G']),
       'Age':pd.Series([25,26,25,35,23,33,31]),
       'Rating':pd.Series([4.23,4.1,3.4,5,2.9,4.7,3.1])}

df = pd.DataFrame(dict)
df

Unnamed: 0,Name,Age,Rating
0,A,25,4.23
1,B,26,4.1
2,C,25,3.4
3,D,35,5.0
4,E,23,2.9
5,F,33,4.7
6,G,31,3.1


## **dataset.columns**

Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas

In [None]:
df.columns

Index(['Name', 'Age', 'Rating'], dtype='object')

## **dataset.T**

Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas.

Pandas DataFrame.transpose() function transpose index and columns of the dataframe. It reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa.

In [None]:

# Transpose-> returns transpose of DataFrame
df.T

Unnamed: 0,0,1,2,3,4,5,6
Name,A,B,C,D,E,F,G
Age,25,26,25,35,23,33,31
Rating,4.23,4.1,3.4,5,2.9,4.7,3.1


In [None]:
# dtypes-> return datatype of each column

df.dtypes

Name       object
Age         int64
Rating    float64
dtype: object

In [None]:
# shape-> returns tuple representing dimensionallity

df.shape

(7, 3)

In [None]:

# Axes-> returns list of row axis labels and column axis labels

df.axes

[RangeIndex(start=0, stop=7, step=1),
 Index(['Name', 'Age', 'Rating'], dtype='object')]

In [None]:
# Data types of each column

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    7 non-null      object 
 1   Age     7 non-null      int64  
 2   Rating  7 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 296.0+ bytes


In [None]:
# values-> returns actual data as ndarray

df.values

array([['A', 25, 4.23],
       ['B', 26, 4.1],
       ['C', 25, 3.4],
       ['D', 35, 5.0],
       ['E', 23, 2.9],
       ['F', 33, 4.7],
       ['G', 31, 3.1]], dtype=object)

In [None]:
# head-> by default head returns first 5 rows

df.head()

Unnamed: 0,Name,Age,Rating
0,A,25,4.23
1,B,26,4.1
2,C,25,3.4
3,D,35,5.0
4,E,23,2.9


In [None]:

# tail-> by default tail returns last 5 rows

df.tail()

Unnamed: 0,Name,Age,Rating
2,C,25,3.4
3,D,35,5.0
4,E,23,2.9
5,F,33,4.7
6,G,31,3.1



## **Statistics**

In [None]:
# sum()-> returns the sum of values for requested axis. by default axis = 0

df.sum()

Name      ABCDEFG
Age           198
Rating      27.43
dtype: object

In [None]:
# axis = 1 -> row wise sum

print(df.sum(1))

0    29.23
1    30.10
2    28.40
3    40.00
4    25.90
5    37.70
6    34.10
dtype: float64


In [None]:

# mean()

print(df.mean())

Age       28.285714
Rating     3.918571
dtype: float64


In [None]:
# std()

print(df.std())

Age       4.644505
Rating    0.804828
dtype: float64


In [None]:
# describe() -> summarizing the data

print(df.describe())

             Age    Rating
count   7.000000  7.000000
mean   28.285714  3.918571
std     4.644505  0.804828
min    23.000000  2.900000
25%    25.000000  3.250000
50%    26.000000  4.100000
75%    32.000000  4.465000
max    35.000000  5.000000


In [None]:
# include object, number, all

print(df.describe(include=['object']))

       Name
count     7
unique    7
top       C
freq      1


In [None]:
print(df.describe(include=['number']))

             Age    Rating
count   7.000000  7.000000
mean   28.285714  3.918571
std     4.644505  0.804828
min    23.000000  2.900000
25%    25.000000  3.250000
50%    26.000000  4.100000
75%    32.000000  4.465000
max    35.000000  5.000000


In [None]:
# Don't pass 'all' as a list

print(df.describe(include='all'))

       Name        Age    Rating
count     7   7.000000  7.000000
unique    7        NaN       NaN
top       C        NaN       NaN
freq      1        NaN       NaN
mean    NaN  28.285714  3.918571
std     NaN   4.644505  0.804828
min     NaN  23.000000  2.900000
25%     NaN  25.000000  3.250000
50%     NaN  26.000000  4.100000
75%     NaN  32.000000  4.465000
max     NaN  35.000000  5.000000
