## [Pandas](https://pandas.pydata.org/docs/)

- In computer programming, pandas is a software library written for the Python programming language for **data manipulation and analysis.**

- In particular, it offers data structures and operations for manipulating numerical tables and time series.

- It is free software released under the three-clause BSD license.

## Importing The Library:

In [19]:
import pandas as pd

## [pd.read_extension](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)

- pandas.read_excel("path_to_file/file_name.xlsx")

- This dataset is in excel file, that is why we used "_excel".

- Pandas is Capable of **reading, csv, json, pickle and a lot more datafiles as well.**

In [36]:
sett = pd.read_excel('D:\DATA SCIENCE\Free data\SampleData.xlsx')

## [dataframe.head(n)](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html)

- This function returns the **first n rows** for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

- By default: if no argument is passed inside head, then it will give first five rows of the dataframe. 

In [21]:
sett.head()

Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit Cost,Total
0,2018-01-06,East,Jones,Pencil,95,1.99,189.05
1,2018-01-23,Central,Kivell,Binder,50,19.99,999.5
2,2018-02-09,Central,Jardine,Pencil,36,4.99,179.64
3,2018-02-26,Central,Gill,Pen,27,19.99,539.73
4,2018-03-15,West,Sorvino,Pencil,56,2.99,167.44


## [dataframe.dtypes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html)

- property 
- **DataFrame.dtypes**

- Return the dtypes in the DataFrame.

In [2]:
sett.dtypes

OrderDate    datetime64[ns]
Region               object
Rep                  object
Item                 object
Units                 int64
Unit Cost           float64
Total               float64
dtype: object

## [data.isnull()](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.isnull.html)

- data.isnull() will check each and every value and provide a boolean result whether the value is True or False.

## data.isnull().sum()

- This will return null values in the specific Column.

In [23]:
sett.isnull().sum()

OrderDate    0
Region       0
Rep          0
Item         0
Units        0
Unit Cost    0
Total        0
dtype: int64

### No Null Values.

In [3]:
sett[sett['Units']<90].head()     # all the units having value more than 90 is not shown.

Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit Cost,Total
1,2018-01-23,Central,Kivell,Binder,50,19.99,999.5
2,2018-02-09,Central,Jardine,Pencil,36,4.99,179.64
3,2018-02-26,Central,Gill,Pen,27,19.99,539.73
4,2018-03-15,West,Sorvino,Pencil,56,2.99,167.44
5,2018-04-01,East,Jones,Binder,60,4.99,299.4


In [6]:
sett.tail()# shows last five rows

Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit Cost,Total
38,2019-10-14,West,Thompson,Binder,57,19.99,1139.43
39,2019-10-31,Central,Andrews,Pencil,14,1.29,18.06
40,2019-11-17,Central,Jardine,Binder,11,4.99,54.89
41,2019-12-04,Central,Jardine,Binder,94,19.99,1879.06
42,2019-12-21,Central,Andrews,Binder,28,4.99,139.72


In [7]:
sett.shape

(43, 7)

In [6]:
data=sett.tail()      # assignong last five rows to another variable.

In [7]:
data

Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit Cost,Total
38,2019-10-14,West,Thompson,Binder,57,19.99,1139.43
39,2019-10-31,Central,Andrews,Pencil,14,1.29,18.06
40,2019-11-17,Central,Jardine,Binder,11,4.99,54.89
41,2019-12-04,Central,Jardine,Binder,94,19.99,1879.06
42,2019-12-21,Central,Andrews,Binder,28,4.99,139.72


In [8]:
data.shape

(5, 7)

## Slicing The DATA

In [9]:
data[data['Units']>20]

Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit Cost,Total
38,2019-10-14,West,Thompson,Binder,57,19.99,1139.43
41,2019-12-04,Central,Jardine,Binder,94,19.99,1879.06
42,2019-12-21,Central,Andrews,Binder,28,4.99,139.72


## Aggregating methods

In [10]:
sett.mean()                    # provide the mean of each column

Units         49.325581
Unit Cost     20.308605
Total        456.462326
dtype: float64

In [10]:
  sett.describe()

Unnamed: 0,Units,Unit Cost,Total
count,43.0,43.0,43.0
mean,49.325581,20.308605,456.462326
std,30.078248,47.345118,447.022104
min,2.0,1.29,9.03
25%,27.5,3.99,144.59
50%,53.0,4.99,299.4
75%,74.5,17.99,600.18
max,96.0,275.0,1879.06


In [13]:
sett.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43 entries, 0 to 42
Data columns (total 7 columns):
OrderDate    43 non-null datetime64[ns]
Region       43 non-null object
Rep          43 non-null object
Item         43 non-null object
Units        43 non-null int64
Unit Cost    43 non-null float64
Total        43 non-null float64
dtypes: datetime64[ns](1), float64(2), int64(1), object(3)
memory usage: 2.4+ KB


## Selecting a column from dataframe - A Series.

In [9]:
y=sett['Region']  # this will provide all the values of 'Region' column.
y.head()

0       East
1    Central
2    Central
3    Central
4       West
Name: Region, dtype: object

In [15]:
type(y)          # Clearly y is a series.

pandas.core.series.Series

## [dataframe.drop](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html)

- DataFrame.drop(self, labels=None, axis=0(rows) / 1(Columns), index=None, columns=None, level=None, inplace=True/False, errors='raise)

- **Drop specified labels from rows or columns.**

- Remove rows or columns by specifying label names and corresponding axis,
or by specifying directly index or column names.

- When using a multi-index, labels on different levels can be removed by specifying the level.

In [37]:
temp = sett

In [38]:
temp.columns

Index(['OrderDate', 'Region', 'Rep', 'Item', 'Units', 'Unit Cost', 'Total'], dtype='object')

In [39]:
temp.drop(["Rep","Region"],axis=1,inplace=True)
## Axis = 1 means from columns.
## inplace = True will enforce the change, without 
## this action will not take place.
temp.head()

Unnamed: 0,OrderDate,Item,Units,Unit Cost,Total
0,2018-01-06,Pencil,95,1.99,189.05
1,2018-01-23,Binder,50,19.99,999.5
2,2018-02-09,Pencil,36,4.99,179.64
3,2018-02-26,Pen,27,19.99,539.73
4,2018-03-15,Pencil,56,2.99,167.44


**Remember that column once dropped, running the above cell will cause an error as there will be no column to drop again.**
