# Introduction to pandas:
[*pandas*](http://pandas.pydata.org/) is a column-oriented data analysis API.
It's a great tool for handling and analyzing input data, and many ML frameworks
support *pandas* data structures as inputs.  

Although a comprehensive
introduction to the API would span many pages, the core concepts are fairly
straightforward, and we'll present them below. For a more complete reference,
the [*pandas* docs site](http://pandas.pydata.org/pandas-docs/stable/index.html)
contains extensive documentation and many tutorials. (Note that Colab may use a
slightly older version number, but the parts of *pandas* covered here are
unlikely to differ from version to version.)

In [14]:
import pandas as pd
pd.__version__

'2.2.2'

The primary data structures in *pandas* are implemented as two classes:
* **`Series`**, which is a single column. Each row can be labeled via an index. A DataFrame contains one or more Series and a name for each Series.
* **`DataFrame`**, which you can imagine as a relational data table, with rows and named columns.

The data frame is a commonly used abstraction for data manipulation. Similar implementations exist in Spark and R.

### Series:
think of series as,
* A single column of data
* Like a list, but with superpowers:
    * It has values
    * It has labels (called an index)

In [15]:
cities = pd.Series(["chennai", "mumbai", "kolkata", "delhi"])
cities


Unnamed: 0,0
0,chennai
1,mumbai
2,kolkata
3,delhi


In [16]:
type(cities)

we can label them ourselves

In [17]:
cities = pd.Series({"south":"Chennai", "west":"Mumbai", "east":"Kolkata", "north":"New Delhi"})

cities

Unnamed: 0,0
south,Chennai
west,Mumbai
east,Kolkata
north,New Delhi


### DataFrame:
* `DataFrame` is a stack of a bunch of `Series` side by side each with a column name.
* its rows have indices
* Columns have names and are actually `Series` under the hood



In [18]:
cities = pd.Series(["chennai", "mumbai", "kolkata", "delhi"])
population = pd.Series([700000, 1700000, 800000])

city_info_df = pd.DataFrame({"cities": cities, "population": population})
city_info_df

Unnamed: 0,cities,population
0,chennai,700000.0
1,mumbai,1700000.0
2,kolkata,800000.0
3,delhi,


In [19]:
type(city_info_df)