# Pandas
<li>Pandas is an open-source Python package that is built on top of NumPy used for working with data sets.</li> 
<li>The name "Pandas" has a reference to <b>"Python Data Analysis".</b></li>
<li>Pandas is considered to be one of the best data-wrangling packages.</li>
<li>Pandas offers user-friendly, easy-to-use data structures and analysis tools for analyzing, cleaning, exploring and manipulating data.</li>
<li>It also functions well with various other data science Python modules.</li>


## Why Use Pandas?

<li>Pandas is known for its exceptional ability to represent and organize data.</li>
<li>The Pandas library was created to be able to work with large datasets faster and more efficiently than any other library.</li>
<li>It excels at analyzing huge amounts of data.Pandas allows us to analyze big data and make conclusions based on statistical theories.</li>
<li>Pandas can clean messy data sets, and make them readable and relevant.</li>
<li>By combining the functionality of Matplotlib and NumPy, Pandas offers users a powerful tool for performing <b>data analytics and visualization.</b></li>
<li>Data can be imported to Pandas from a variety of file formats, such as Csv, SQL, Excel, and JSON, among others.</li>
<li>Pandas is a versatile and marketable skill set for data analysts and data scientists that can gain the attention of employers.</li>


## Installation Of Pandas
<li>Go to your terminal, open and activate your virtual environment and then use the following commands for installing pandas.</li>

<code>
    pip install pandas
</code>

## Importing Pandas
<li>We need to import pandas if we want to create a pandas dataframe and perform any analysis on them.</li>
<li>We can import pandas package using the following command:</li>
<code>
    import pandas as pd
</code>

In [6]:
import pandas as pd

## How To Create A Pandas DataFrame
<li>A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, arranged in a table like structure with rows and columns.</li>
<li>We can create a basic pandas dataframe by various methods.</li>
<li>Let's discuss some of the methods to create the given dataframes:</li>

### 1. From Python Dictionary

In [13]:
data = {
    "name": ["naresh","ram"],
    "age": [24,25],
    "address": ["bkt","ktm"]
}

In [15]:
df = pd.DataFrame(data)
df

Unnamed: 0,name,age,address
0,naresh,24,bkt
1,ram,25,ktm


### 2. From a list of dictionaries

In [19]:
list_dic = [
    {
        "name":"Hari",
        "age": 21
        
    },
    {
      "name":"amisha",
        "age": 20
    }
]

In [20]:
list_df = pd.DataFrame(list_dic)
list_df

Unnamed: 0,name,age
0,Hari,21
1,amisha,20


### 3. From a list of tuples

In [21]:
list_tupe=[
    ("naresh", 23,"bkt"),
    ("megamind",34,"ktm")
]

In [22]:
tup_dic = pd.DataFrame(list_tupe)
tup_dic

Unnamed: 0,0,1,2
0,naresh,23,bkt
1,megamind,34,ktm


### 4. From list of lists

In [23]:
nested_list = [[
    "naresh",23,"ktm"
],
              ['megamind',24,"bkt"]]

In [25]:
nested_dic = pd.DataFrame(nested_list)
nested_dic

Unnamed: 0,0,1,2
0,naresh,23,ktm
1,megamind,24,bkt


# Question
1. Read 'imports-85.data' file using file reader.
2. Store the data present inside the file into a list of list.
3. Create a pandas dataframe using list of lists.
4. For column name, we can use the columns variable given below.

In [26]:
import csv
with open("data/imports-85.data") as file:
    reader = csv.reader(file)
    data_list = list(reader)


In [27]:
_data_df = pd.DataFrame(data_list)
_data_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
0,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495
1,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500
2,1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500
3,2,164,audi,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950
4,2,164,audi,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450


In [28]:
columns = ['symboling', 'normalized_losses', 'make', 'fuel_type', 'aspiration', 'num_of_doors',
          'body_style', 'drive_wheels', 'engine_location', 'wheel_base', 'length', 'width', 
           'height', 'curb_weight', 'engine_type', 'num_of_cylinders', 'engine_size', 'fuel_system',
          'bore', 'stroke', 'compression', 'horsepower', 'peak_rpm', 'city_mpg', 'highway_mpg', 
           'price']

In [31]:
_data_df.columns = columns
_data_df

Unnamed: 0,symboling,normalized_losses,make,fuel_type,aspiration,num_of_doors,body_style,drive_wheels,engine_location,wheel_base,...,engine_size,fuel_system,bore,stroke,compression,horsepower,peak_rpm,city_mpg,highway_mpg,price
0,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,...,130,mpfi,3.47,2.68,9.00,111,5000,21,27,13495
1,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,...,130,mpfi,3.47,2.68,9.00,111,5000,21,27,16500
2,1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.50,...,152,mpfi,2.68,3.47,9.00,154,5000,19,26,16500
3,2,164,audi,gas,std,four,sedan,fwd,front,99.80,...,109,mpfi,3.19,3.40,10.00,102,5500,24,30,13950
4,2,164,audi,gas,std,four,sedan,4wd,front,99.40,...,136,mpfi,3.19,3.40,8.00,115,5500,18,22,17450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200,-1,95,volvo,gas,std,four,sedan,rwd,front,109.10,...,141,mpfi,3.78,3.15,9.50,114,5400,23,28,16845
201,-1,95,volvo,gas,turbo,four,sedan,rwd,front,109.10,...,141,mpfi,3.78,3.15,8.70,160,5300,19,25,19045
202,-1,95,volvo,gas,std,four,sedan,rwd,front,109.10,...,173,mpfi,3.58,2.87,8.80,134,5500,18,23,21485
203,-1,95,volvo,diesel,turbo,four,sedan,rwd,front,109.10,...,145,idi,3.01,3.40,23.00,106,4800,26,27,22470


### 5. Pandas Dataframe From Csv files

<li>We can load a csv file and create a dataframe out of the data present inside a csv file using pandas.</li>
<li>We have <b>.read_csv()</b> method to read a csv file and create a pandas dataframe from the dataset.</li>

In [48]:
weather_df = pd.read_csv("data/weather_data.csv", names =["day","temperature","windspeed","event"])
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,kfjkdfjskd,,,
1,dfuhsdjufio,,,
2,day,temperature,windspeed,event
3,1/1/2017,32,6,Rain
4,1/4/2017,not available,9,Sunny


# Reading a csv file using skiprows and header parameters

In [41]:
weather_df = pd.read_csv("data/weather_data.csv", skiprows=2)
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


In [36]:
weather_df = pd.read_csv("data/weather_data.csv", header=2)
weather_df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


# Reading a csv file without header and giving names to the columns

In [45]:
weather_df_header = pd.read_csv("data/weather_data.csv", skiprows=3, header=None , names=['date',"temperature","windspeed","event"])
weather_df_header.head()

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


# Read limited data from a csv file using nrows parameters

In [49]:
weather_df_header = pd.read_csv("data/weather_data.csv", skiprows=3, header=None , names=['date',"temperature","windspeed","event"],nrows=8)
weather_df_header

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain
5,1/8/2017,not available,not measured,Sunny
6,1/9/2017,not available,not measured,no event
7,1/10/2017,34,8,Cloudy


In [50]:
weather_df_header = pd.read_csv("data/weather_data.csv", skiprows=3, header=None , names=['date',"temperature","windspeed","event"],nrows=5)
weather_df_header

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/4/2017,not available,9,Sunny
2,1/5/2017,-1,not measured,Snow
3,1/6/2017,not available,7,no event
4,1/7/2017,32,not measured,Rain


# Reading csv files with na_values parameters ('weather_data.csv' file)

In [53]:
weather_df_header = pd.read_csv("data/weather_data.csv", skiprows=3, header=None , names=['date',"temperature","windspeed","event"],nrows=5,na_values=["not available","not measured","no event"])
weather_df_header

Unnamed: 0,date,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/4/2017,,9.0,Sunny
2,1/5/2017,-1.0,,Snow
3,1/6/2017,,7.0,
4,1/7/2017,32.0,,Rain
