# A. The Pandas Series Object
A Pandas Series Object is a **one-dimensional** ***labeled array*** capable of **holding any type of data**. Because the series is a one-dimensional object, it has a ***single axis - the index***. The main property of a single axis object is that ***data is arranged in a linear fashion*** like that of lists or arrays. 

In [1]:
# import the libraries 
import pandas as pd

## A.1 Create a series from a list
Let us create a Pandas Series object from a list!

In [2]:
data = [10,20,30,40]
series = pd.Series(data=data)

print(series)

0    10
1    20
2    30
3    40
dtype: int64


#### Explanation:
- The data list contains the values `[10, 20, 30, 40]`.
- The index of the Series is automatically generated as `0, 1, 2, 3`, corresponding to each value.

#### NOTE
- The index is not part of the values - the index is called *axis*
- The values of the index is called the *axis labels*

Thus, a Series has three attributes namely:
- values 
- index
- name (optional) - we have not asigned a name to our series yet! 

Let us now try to see the attributes for the above series!

In [3]:
print(f'The values of the series are: {series.values}') # values attribute provide a list of the values of the series object 

print(f"The index of the series is:{series.index}") # the index here is a RangeIndex object which can be iterated over 

# let us now give a name to our series. Currently there is no name assigned to the series object
print(f"he current name of the series is:{series.name}")

# set the name 
series.name = 'Integer Series of 10x'
print(f'The name os set for the series. The current name is: {series.name}')

print(series)

The values of the series are: [10 20 30 40]
The index of the series is:RangeIndex(start=0, stop=4, step=1)
he current name of the series is:None
The name os set for the series. The current name is: Integer Series of 10x
0    10
1    20
2    30
3    40
Name: Integer Series of 10x, dtype: int64


#### How is `series` object differrent from `NumPy` one dimensional array?

- The short answer to this is **Double Abstraction of Index**

Let us now elaborate on this!
**Index** provides two level of abstraction.
- The **first level of abstraction** - index allows you to label data points and hence making them easy to reference!
- The **second level of abstraction** - index, as a built-in data structure allows you to modify and use a custom index independent of the values. This is not present in a `NumPy` one dimensional data structure!

In [8]:
# Creating a Series with a custom index
data = [10, 20, 30, 40]
index = ['a', 'b', 'c', 'd']

series = pd.Series(data=data, index=index)

print(series)

a    10
b    20
c    30
d    40
dtype: int64


In [4]:
# I can also provide a data type to the Series object 

# Creating a Series with a custom index
data = [10, 20, 30, 40]
index = ['a', 'b', 'c', 'd']

series = pd.Series(data=data, index=index, dtype="float")

print(series)

a    10.0
b    20.0
c    30.0
d    40.0
dtype: float64


## A.2 Series as a specialized Dictionary

In [9]:
# Creating a dictionary
data_dict = {'a': 10, 'b': 20, 'c': 30, 'd': 40}

# Converting the dictionary to a Series
# keys of the dictionary becomes the index and the values become the data points in the pandas Series object 
series_from_dict = pd.Series(data_dict)

print(series_from_dict)

a    10
b    20
c    30
d    40
dtype: int64


## A.3 Some properties of Series

### A.3.1 Series is Non-Homogenous

In [10]:
# The actual data (or values) for a series does not have to be numeric or homogeneous
data_dict = {'a': 10, 'b': 'Harry Potter', 'c': False, 'd': 'Lionel Messi'}

# Converting the dictionary to a Series
series_from_dict = pd.Series(data_dict)

print(series_from_dict)

a              10
b    Harry Potter
c           False
d    Lionel Messi
dtype: object


#### NOTE

The datatype of the Series is now *object* - i.e. a python object.

- The object data type is also used for a series with string values. In addition, it is also used for values that have heterogeneous or mixed types.

### A.3.2 Can incorporate null values inside series object 

In [11]:
import numpy as np

# create a series with null values 
nan_series = pd.Series(data=[12,20,30,np.nan])

nan_series

0    12.0
1    20.0
2    30.0
3     NaN
dtype: float64

**NOTE**
- When we have null values inside a series, the `size` attribute of the series object returns the size of the array including the null values! 
- However, if, instead we use a `count()` method on the object, it returns only the count of the non null values

Let us see them with the above example!


In [12]:
print(f'The series will null values is:\n{nan_series}')
print(f'The size of the series will null values is:{nan_series.size}')
print(f'The count of the non null values inside the series object is:{nan_series.count()}')

The series will null values is:
0    12.0
1    20.0
2    30.0
3     NaN
dtype: float64
The size of the series will null values is:4
The count of the non null values inside the series object is:3


# B. Pandas DataFrame Object

If a **Series** is an ***analog of a one-dimensional array with explicit indices***, a **DataFrame** is an ***analog of a two-dimensional array with explicit row and column indices***. Just as you might think of a two-dimensional array as an ordered sequence of aligned one dimensional columns, you can ***think of a DataFrame as a sequence of aligned Series objects***. Here, by “aligned” we mean that they share the same index.

## B.1 Creating a DataFrame from a Dictionary

Let us create a simple pandas DataFrame object!

In [5]:
df = pd.DataFrame({
    'Age':[20,34,56,78],
    'Name' : ['Ajay', 'Shin', 'Freddy', 'Michael']
})
print(df)
print(type(df))

   Age     Name
0   20     Ajay
1   34     Shin
2   56   Freddy
3   78  Michael
<class 'pandas.core.frame.DataFrame'>


A DataFrame has two indices - namely a `row index` and a `column index` 

In [5]:
# row index 
print(f'The row indices of the above dataframe is : {df.index}')

#column index 
print(f"The coumn indices of the above dataframe is : {df.columns}")

The row indices of the above dataframe is : RangeIndex(start=0, stop=4, step=1)
The coumn indices of the above dataframe is : Index(['Age', 'Name'], dtype='object')


### B.1.1 DataFrame as a specialized Dictionary

- A dictionary maps the key to the value, while the DataFrame maps a column name to a `Series` of columnar data

In [6]:
# df as a specialized dictionary
df['Age']

0    20
1    34
2    56
3    78
Name: Age, dtype: int64

In [7]:
# type returned 
type(df["Age"])

pandas.core.series.Series

## B.2 Creating DataFrame from Series 

In [8]:
population_dict = {'California': 39538223, 'Texas': 29145505, 
                   'Florida': 21538187, 'New York': 20201249, 
                   'Pennsylvania': 13002700}

population_series = pd.Series(population_dict)
print(population_series)
print(type(population_series))

print('\n')

population_df = pd.DataFrame(population_series, 
                             columns=['Population'])
print(population_df)
print(type(population_df))

California      39538223
Texas           29145505
Florida         21538187
New York        20201249
Pennsylvania    13002700
dtype: int64
<class 'pandas.core.series.Series'>


              Population
California      39538223
Texas           29145505
Florida         21538187
New York        20201249
Pennsylvania    13002700
<class 'pandas.core.frame.DataFrame'>


## B.3 Creating DataFrame from a list of dictionary

In [9]:
# consider a list of dictionary
list_dict = [
    {'country':'India', 'rank':23},
    {'country':'China', 'rank':2},
    {'country':'USA', 'rank':1},
    {'country':'Japan', 'rank':10},
    {'country':'UK', 'rank':15},
    {'country':'Taiwan', 'rank':30},
    {'country':'Bangladesh', 'rank':50}
]

# convert this to df
df_new = pd.DataFrame(list_dict)
df_new

Unnamed: 0,country,rank
0,India,23
1,China,2
2,USA,1
3,Japan,10
4,UK,15
5,Taiwan,30
6,Bangladesh,50


## B.4 Creating DataFrame from a 2D NumPy array

In [6]:
# create a 2d numpy array
import numpy as np 

arr_numpy = np.random.randint(15, size=(4,4))
print(f'The NumPy array is:\n{arr_numpy}')

# convert this to a df
df_from_numpy = pd.DataFrame(data= arr_numpy,
                             columns=['A','B','C','D'],
                             index=['Row 1', 'Row 2', 'Row 3', 'Row 4'])

print(f'The converted dataframe is:\n{df_from_numpy}')

# let us now convert thos df to a numpy object again
arr_numpy_from_df = df_from_numpy.to_numpy()
print(f'The converted numpy array is :\n{arr_numpy_from_df}')

The NumPy array is:
[[ 9 14  4  1]
 [ 8 12  2 14]
 [ 9  6  8 12]
 [13 14  2  3]]
The converted dataframe is:
        A   B  C   D
Row 1   9  14  4   1
Row 2   8  12  2  14
Row 3   9   6  8  12
Row 4  13  14  2   3
The converted numpy array is :
[[ 9 14  4  1]
 [ 8 12  2 14]
 [ 9  6  8 12]
 [13 14  2  3]]


## B.5 DataFrame as a series of aligned Series objects

In [7]:
population_dict = pd.Series({'California': 39538223, 'Texas': 29145505, 
                   'Florida': 21538187, 'New York': 20201249, 
                   'Pennsylvania': 13002700})

area_dict = pd.Series({'California': 423967, 'Texas': 695662, 'Florida': 170312, 
             'New York': 141297, 'Pennsylvania': 119280})

states = pd.DataFrame({
    'population': population_dict,
    'area': area_dict
}) # think states as a combination of two aligned series objects 
states

Unnamed: 0,population,area
California,39538223,423967
Texas,29145505,695662
Florida,21538187,170312
New York,20201249,141297
Pennsylvania,13002700,119280


In [8]:
# What are the values 
print(states.values) # presented in a numpy array 

[[39538223   423967]
 [29145505   695662]
 [21538187   170312]
 [20201249   141297]
 [13002700   119280]]


## B.6 Creating DataFrame from a NumPy structured array

In [15]:
# later 

# C. Pandas Index Object

- Both `Series` and `DataFrame` objects have explicit indices. 
- `Index` object is a very interesting structure and can be thought of as an ***immutable array or a ordered set(although the `Index` objects may have repeated values)***

In [None]:
# let us create a index object 
index = pd.Index([1,3,5,7,9,11])
print(index)

# index as a immutable array

# like arrays they can be accessed using python indexing or slicing 
print(index[2]) # 5
print(index[:2]) # [1,3]

# it also has many attributes common to NumPy array
print(index.size, index.shape, index.ndim, index.dtype)

# however the index is not mutable 
index[2] = 10 # error!

Index([1, 3, 5, 7, 9, 11], dtype='int64')
5
Index([1, 3], dtype='int64')
6 (6,) 1 int64


TypeError: Index does not support mutable operations

In [22]:
# index as ordered sets 

index_1 = pd.Index([1,3,5,7,9,11])
index_2 = pd.Index([1,3,5,2,8,10])

print(index_1.intersection(index_2))
print(index_1.union(index_2))
print(index_1.symmetric_difference(index_2))

Index([1, 3, 5], dtype='int64')
Index([1, 2, 3, 5, 7, 8, 9, 10, 11], dtype='int64')
Index([2, 7, 8, 9, 10, 11], dtype='int64')


END!