# Introduction to Pandas (Python and Data Analysis)

We start with understanding the difference between "Series" and "DataFrame"

- A series is a single column of data
- A dataframe is the entire sheet of data

In [2]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

In [3]:
x = Series([30,40,50]) # This gives the index (first column) and the values (subsequent columns)
x

0    30
1    40
2    50
dtype: int64

In [4]:
x.index

RangeIndex(start=0, stop=3, step=1)

In [5]:
x.values

array([30, 40, 50], dtype=int64)

## Creating Series with Index 

By default the index is a range starting from 0. If necessary, one can specify the index.

In [6]:
sales = Series([45000,65000,87000], index=["Don", "Mike", "Edward"])
sales

Don       45000
Mike      65000
Edward    87000
dtype: int64

In [7]:
type(sales)

pandas.core.series.Series

### Checking for a specific value

In [9]:
sales["Don"]

45000

### Checking for conditions

In [10]:
sales[sales>50000]

Mike      65000
Edward    87000
dtype: int64

### Checking for existence of a value by key

In [11]:
"Don" in sales

True

In [12]:
"John" in sales

False

### Converting series to dictionaries

In [13]:
sales_dict = sales.to_dict()
sales_dict

{'Don': 45000, 'Mike': 65000, 'Edward': 87000}

### Converting dictionaries to series

In [16]:
sales_series = Series(sales_dict)
sales_series

Don       45000
Mike      65000
Edward    87000
dtype: int64

### Calling dictionaries into series

In [17]:
new = ["Don", "Mike", "Edward", "John"]

In [18]:
ssales = Series(sales_dict, index=new)
ssales

Don       45000.0
Mike      65000.0
Edward    87000.0
John          NaN
dtype: float64

### Finding Null values

In [19]:
pd.isnull(ssales)

Don       False
Mike      False
Edward    False
John       True
dtype: bool

### Adding values in series

In [22]:
ssales = sales + ssales

### Naming a series

In [23]:
ssales.name = "Total Sales"

In [24]:
ssales

Don       44999.0
Edward    86999.0
John          NaN
Mike      64999.0
Name: Total Sales, dtype: float64

### Naming an index

In [25]:
ssales.index.name = "Sales Person"
ssales

Sales Person
Don       44999.0
Edward    86999.0
John          NaN
Mike      64999.0
Name: Total Sales, dtype: float64

## Creating a DataFrame

### Creating a DataFrame from a list

In [26]:
import pandas as pd

In [27]:
data = [["Adrian", 20], ["Beatrice", 32], ["Chloe", 41]]
df = pd.DataFrame(data,columns = ["Name", "Age"], dtype = int)
df

Unnamed: 0,Name,Age
0,Adrian,20
1,Beatrice,32
2,Chloe,41


We specify the data type as an integer. We could give it as a float, but that would not make much sense for an age value. Each age value will then be given as 20.0 etc. 

### Creating a DataFrame from dictionaries - using the default index

In [28]:
new = {"Name": ["Tom", "Jack", "Steve", "Ricky"], "Sales": [25000,30000,35000,40000]}
df2 = pd.DataFrame(new)
df2

Unnamed: 0,Name,Sales
0,Tom,25000
1,Jack,30000
2,Steve,35000
3,Ricky,40000


### - Now using a specified index 

In [29]:
new2 = {"Name": ["Tom", "Jack", "Steve", "Ricky"], "Sales": [25000,30000,35000,40000]}
df3 = pd.DataFrame(new2, index=["rank1", "rank2", "rank3", "rank4"])
df3

Unnamed: 0,Name,Sales
rank1,Tom,25000
rank2,Jack,30000
rank3,Steve,35000
rank4,Ricky,40000
