## Introduction to Pandas
Pandas is a powerful and flexible open-source data analysis and manipulation library for the Python programming language. It provides data structures and functions needed to work with structured data seamlessly, particularly data tables similar to those found in relational databases or spreadsheets.

### Why Pandas?

Pandas was created to fill the need for a flexible and easy-to-use data manipulation tool that was missing in the Python ecosystem. Before Pandas, Python was less commonly used for data analysis compared to languages like R, which had more advanced data manipulation capabilities.

Key motivations for the development of Pandas include:

  * Handling Real-World Data: Most real-world data is messy and requires extensive cleaning and transformation. Pandas provides tools to handle missing data, filter rows, and columns, and merge datasets efficiently.

  * Ease of Use: Pandas offers a user-friendly interface that abstracts many of the complexities of data manipulation, making it accessible to both novice programmers and experienced data scientists.

  * Performance: Built on top of NumPy, Pandas leverages fast, efficient operations on large datasets, making it suitable for big data applications.

  * Integration with Other Tools: Pandas integrates well with other libraries in the Python ecosystem, such as Matplotlib for visualization and SciPy for scientific computing, providing a comprehensive environment for data analysis.

#### Installing and Importing Pandas:
1. Installation:

Pandas is part of the Anaconda distribution and can be installed as:
`conda install -c conda-forge pandas`

Pandas can be installed via package installer pip: `pip install pandas`

2. Importing into a file:

Pandas is imported into a working file using `import`, and more often than not, the alias `pd` is used.


In [1]:
# installing
!pip install pandas



In [2]:
# importing
import pandas as pd

## Handling Data with Pandas
When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean, and process your data. In pandas, a data table is called a DataFrame.

A Dataframe consists of rows and columns. Lets see what a dataframe looks like..


![alt text](https://docs.google.com/uc?export=download&id=1-WnqXG18kG9doRMVADQq8Lt9Xe5BTttk)

In the above image, you can see a table of data storing information about 4 cars. The 4 cars are kept in separate rows while the attributes (features) of the cars are kept in separate columns. The features may be:
1. Categorical:
  
  * Nominal: not ordered, mutually exculsive. eg: Car, House, Man
  * Ordinal: ordered, mutually exclusive. eg: low, medium, high

2. Numerical:

  * Discrete: Only particular numbers. eg: Counts
  * Continuous: Any numerical values. eg: Height in cm

In the above figure, the 'color' is categorical column having categorical values, while the 'number of doors' is a numerical column.

### Series vs DataFrame
DataFrame is made up of multiple series. Thus, a DataFrame can store more complex and heterogeneous data, while a Series can store more simple and homogeneous data.

In the above figure, if you look at one individual column, "Company," for example, this is a Series. There are four columns, which means that this table is made up of four pandas series put together. If you are wondering about the numbers 0, 1, 2, and 3 on the left, they are just indices, so we don't count them.

Alternatively, you can also look at each row as a Series. The first row where it says "Company," "Automatic shift," etc. is the equivalent of indices, we do not count them. They simply describe what the data means. If you look at it this way, there are four rows, so there are four pandas Series.

Let us create a pandas Series.

In [3]:
name_list = ['Ford', 'Ferrari', 'Lamborghini', 'Toyota']

# Create Series
names = pd.Series(name_list)

names

Unnamed: 0,0
0,Ford
1,Ferrari
2,Lamborghini
3,Toyota


#### values and index properties of series:

 `series.values` Outputs the underlying array of data: ['Ford' 'Ferrari' 'Lamborghini' 'Toyota'].

 `series.index` Outputs the index object, which by default is a RangeIndex starting from 0, stopping at 4, with a step of 1. This indicates the positions of the elements in the Series.

In [4]:
print(names.values)
print(names.index)

['Ford' 'Ferrari' 'Lamborghini' 'Toyota']
RangeIndex(start=0, stop=4, step=1)


### Indexing in Series

The series object looks like a one-dimensional NumPy array. However, while the index in a NumPy array is implicitly defined, the series in Pandas have explicitly defined the index.

Due to this explicit definition, the series object is not limited only to the integer index, but we can also have strings as an index.

In [5]:
data = pd.Series([5, 10, 15, 20], index=['a', 'b', 'c', 'd'])
data

Unnamed: 0,0
a,5
b,10
c,15
d,20


### Accessing data from the series

In [6]:
# numerical indexing/slicing as numpy
print(data[0])
print(data[0:4])

5
a     5
b    10
c    15
d    20
dtype: int64


  print(data[0])


In [7]:
# accessomg data using the actual index
print(data['a'])

5


### Saving and loading file:

`series.to_csv` is used to convert into series, `pd.read_csv` is used to read the csv.

In [8]:
data.to_csv('abcd.csv')

In [12]:
read_data = pd.read_csv('abcd.csv', index_col=0)
read_data

Unnamed: 0,0
a,5
b,10
c,15
d,20


## Assignment 1:

Write a python script to create a series from a list. Save the series in a .csv file, and then load it. Now, prompt the user to access an element in the series, and display the element/s.

In [13]:
## Assignment 1 code:


### DataFrame Creation
DataFrame has multiple columns, so lets create DataFrame. Before that lets create an empty dataframe where we can add the columns as needed.

In [14]:
#create empty dataframe
df = pd.DataFrame()

print(df)

Empty DataFrame
Columns: []
Index: []


In [15]:
name_list = ['Ford', 'Ferrari', 'Lamborghini', 'Toyota']

# Create DataFrame directly from list
df = pd.DataFrame(data = {'Company': name_list})

print(df)


       Company
0         Ford
1      Ferrari
2  Lamborghini
3       Toyota


Lets create a DataFrame with more columns.
We can directly use the list to create a dataframe or first create series from the lists and use the series to make a dataframe

In [16]:
name_list = ['Ford', 'Ferrari', 'Lamborghini', 'Toyota']
shift_list = [1,1,1,0]
color_list = ['red', 'blue', 'white', 'white']
door_list = [4,2,2,4]

# Create series first and later use them.
names = pd.Series(name_list)
shift = pd.Series(shift_list)
color = pd.Series(color_list)
door = pd.Series(door_list)

#create dataframe
df = pd.DataFrame()
df['Company'] = names
df['Automatic shift'] = shift
df['Color'] = color
df['Number of doors'] = door

#display dataframe
df

Unnamed: 0,Company,Automatic shift,Color,Number of doors
0,Ford,1,red,4
1,Ferrari,1,blue,2
2,Lamborghini,1,white,2
3,Toyota,0,white,4


Lets create dataframe directly from the lists

In [17]:
name_list = ['Ford', 'Ferrari', 'Lamborghini', 'Toyota']
shift_list = [1,1,1,0]
color_list = ['red', 'blue', 'white', 'white']
door_list = [4,2,2,4]

#create dataframe
df = pd.DataFrame(data={'Company':name_list,
                        'Automatic shift':shift_list,
                        'Color':color_list,
                        'Number of doors':door_list})

#display dataframe
df

Unnamed: 0,Company,Automatic shift,Color,Number of doors
0,Ford,1,red,4
1,Ferrari,1,blue,2
2,Lamborghini,1,white,2
3,Toyota,0,white,4


In [18]:
dfi=df.set_index('Color')

In [19]:
dfi

Unnamed: 0_level_0,Company,Automatic shift,Number of doors
Color,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
red,Ford,1,4
blue,Ferrari,1,2
white,Lamborghini,1,2
white,Toyota,0,4


### Assignment 2:

Create a dataframe from lists, save it in .csv file, load it, then access the element from the dataframe.

In [21]:
## Assignment 2 code: