# Introduction to Pandas

Pandas are used to work with statistical Data. This notebook is the reference code for getting input and output, pandas can read a variety of file types using its pd.read_ methods. pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

### Why pandas is important?

Artificial Intelligence is about executing machine learning algorithms on products that we use every day. Any ML algorithm, for it to be effective, needs the following prerequisite steps to be done.
Data Collection – Conducting opinion Surveys, scraping the internet, etc.
Data Handling – Viewing data as a table, performing cleaning activities like checking for spellings, removal of blanks and wrong cases, removal of invalid values from data, etc.
Data Visualization – plotting appealing graphs, so anyone who looks at the data can know what story the data tells us.

Let us start exploring it.

In [1]:
#Import the library
import numpy as np
import pandas as pd

In [2]:
#Creating DatFrame

df=pd.DataFrame({'name':['mayu','vish','nish','pratima','utkranti','vinayak','Avishkar'],
                'level':['l1','l2','l3','l5','l1','l2', 'l1'],
                 'Sal':[1234,12324,4574,56780,578957,567, 356]})
df.to_excel('employee.xlsx',sheet_name='Sheet1')

## CSV
Loading the dataset in the form of CSV file is one of the major operation of Pandas

### CSV Input

In [5]:
#Loading data into CSV file

Unnamed: 0,name,level,Sal
0,mayu,l1,1234
1,vish,l2,12324
2,nish,l3,4574
3,pratima,l5,56780
4,utkranti,l1,578957
5,vinayak,l2,567


<details>
<summary>Solution</summary>
<p>
    
```python
df = pd.read_csv('employee1.csv')
df

```
    
</p>
</details>

### CSV Output


In [6]:
#printing CSV into the jupyter notebook

<details>
<summary>Solution</summary>
<p>
    
```python
df.to_csv('example',index=False)
```
    
</p>
</details>

In [7]:
df

Unnamed: 0,name,level,Sal
0,mayu,l1,1234
1,vish,l2,12324
2,nish,l3,4574
3,pratima,l5,56780
4,utkranti,l1,578957
5,vinayak,l2,567


## Excel
Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or images, having images or macros may cause this read_excel method to crash. 

In [8]:
#Loading the dataset as Excel

Unnamed: 0.1,Unnamed: 0,name,level,Sal
0,0,mayu,l1,1234
1,1,vish,l2,12324
2,2,nish,l3,4574
3,3,pratima,l5,56780
4,4,utkranti,l1,578957
5,5,vinayak,l2,567
6,6,Avishkar,l1,356



<details>
<summary>Solution</summary>
<p>
    
```python
df = pd.read_excel('employee.xlsx')
df
```
    
</p>
</details>

In [9]:
#Convert a dataset into xsxl format



<details>
<summary>Solution</summary>
<p>
    
```python
df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')
df
```
    
</p>
</details>

## HTML

You may need to install htmllib5,lxml, and BeautifulSoup4. In your terminal/command prompt run:

    conda install lxml
    conda install html5lib
    conda install BeautifulSoup4

Then restart Jupyter Notebook.
(or use pip install if you aren't using the Anaconda Distribution)

Pandas can read table tabs off of html. For example:

### HTML Input

Pandas read_html function will read tables off of a webpage and return a list of DataFrame objects:

In [10]:
#Reading a datset from a html page


<details>
<summary>Solution</summary>
<p>
    
```python
df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')


```
    
</p>
</details>

In [11]:
df1 = df[0]
df1


Unnamed: 0,Bank Name,City,ST,CERT,Acquiring Institution,Closing Date
0,Almena State Bank,Almena,KS,15426,Equity Bank,"October 23, 2020"
1,First City Bank of Florida,Fort Walton Beach,FL,16748,"United Fidelity Bank, fsb","October 16, 2020"
2,The First State Bank,Barboursville,WV,14361,"MVB Bank, Inc.","April 3, 2020"
3,Ericson State Bank,Ericson,NE,18265,Farmers and Merchants Bank,"February 14, 2020"
4,City National Bank of New Jersey,Newark,NJ,21111,Industrial Bank,"November 1, 2019"
...,...,...,...,...,...,...
558,"Superior Bank, FSB",Hinsdale,IL,32646,"Superior Federal, FSB","July 27, 2001"
559,Malta National Bank,Malta,OH,6629,North Valley Bank,"May 3, 2001"
560,First Alliance Bank & Trust Co.,Manchester,NH,34264,Southern New Hampshire Bank & Trust,"February 2, 2001"
561,National State Bank of Metropolis,Metropolis,IL,3815,Banterra Bank of Marion,"December 14, 2000"


## Conclusion

This is how we can read and write the data using pandas

## Additional Resources

1. Pandas Loading : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
2. Dataframes in Pandas: https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python