In [4]:
import pandas as pd
import numpy as np

In our first chapter we give an intro into Pandas and we saw why we should learn Panadas. 


## Pandas Data Structures 

Pandas comes with two Data Structures (Series & DataFrames)

1 - **Series**:
- is a one-dimensional labeled array capable of holding any data type.![Series](imgs/series.png "Series")


2 - **DataFrame**
- is a 2-dimensional labeled data structure with columns of potentially different types.![DataFrame](imgs/dataframe.png "DataFrame")

  Think about it as an excel sheet. The **Series** has only one column and the **DataFrame** has multiple columns.

  DataFrames and Series are quite similar in that many operations that you can do with one you can do with the other, such as filling in null values and calculating the mean.

  We also could conclude that a **DataFrame** is a collection of **Series**






## How to create a Pandas Series ?

To create a pandas series we use the **Series()** function. 
```
pd.Series(
    data=None,
    index=None,
    dtype=None,
    name=None,
    copy=False,
    fastpath=False,
)
```


#### 1 - Create from an array. 

In [23]:
data = pd.array(['a','b','c'])
array_series = pd.Series(data)
array_series

0    a
1    b
2    c
dtype: object

#### 2 - Create series from ndarray

In [34]:
data = pd.array(['a','b','c'])
idx_lbl = [1,2,3]
ndarray_series = pd.Series(data,index=idx_lbl)
ndarray_series

# accessing a element using index element
# ndarray_series[1]

1    a
2    b
3    c
dtype: string

#### 3 - Create a series from a dict - dictionary

In [33]:
# when using a dict the key of the dict will become the series index
                
my_dict = {1:'a',2:'b',3:'c'}

dict_series = pd.Series(my_dict)

dict_series

# accessing a element using index element
dict_series[1]

'a'

#### 4 - Create a series from a Scalar value

In [38]:
# if data is a scalar value, an index must be provided. The value will be repeated to match the length of index.
scalar_series = pd.Series(5, index=['a', 'b', 'c', 'd', 'e'])
scalar_series

a    5
b    5
c    5
d    5
e    5
dtype: int64

## How to create a Pandas DataFrame ?
To create a pandas series we use the **DataFrame()** function. 
```
pd.DataFrame(
    data=None,
    index: Union[Collection, NoneType] = None,
    columns: Union[Collection, NoneType] = None,
    dtype: Union[str, numpy.dtype, ForwardRef('ExtensionDtype'), NoneType] = None,
    copy: bool = False,
)
```


#### 1 - Create a DataFrame from two series

In [41]:
data = pd.array(['a','b','c'])
series1 = pd.Series(data)
series2 = pd.Series(data)

## Create the content
data = {
    'series1': series1,
    'series2':series2
}

## Create the DataFrame
my_df = pd.DataFrame(data)

## View the DataFrame
my_df

Unnamed: 0,series1,series2
0,a,a
1,b,b
2,c,c


In [46]:
## What is the series don`t have the same number of elements ? 

data = pd.array(['a','b','c'])
data1 = pd.array(['a','b','c','d'])
series1 = pd.Series(data)
series2 = pd.Series(data1)

## Create the content
data = {
    'series1': series1,
    'series2':series2
}

## Create the DataFrame
my_df = pd.DataFrame(data)

## View the DataFrame
my_df

Unnamed: 0,series1,series2
0,a,a
1,b,b
2,c,c
3,,d


#### 2 - Create a DataFrame from a single series

In [48]:
data = pd.array(['a','b','c'])

series1 = pd.Series(data)


## Create the content
data = {
    'series1': series1
}

## Create the DataFrame
my_df = pd.DataFrame(data)

## View the DataFrame
my_df
type(my_df)

pandas.core.frame.DataFrame

#### 2 - Create a DataFrame from a dict element

In [50]:
my_dict ={
    'Name':['Adrian']
}

my_df = pd.DataFrame(my_dict)

my_df

Unnamed: 0,Name
0,Adrian


In [84]:
## Let`s add an index to the DataFrame creation but with only one value as data
my_dict ={
    'Name':['Adrian']
}



my_df = pd.DataFrame(my_dict,index=[1,2,3,4,5,6,7,8,9])

my_df

## by default an index is added, starting a position 0 and will match how many items we have, or we will get items * number of
## declared indexes

Unnamed: 0,Name
1,Adrian
2,Adrian
3,Adrian
4,Adrian
5,Adrian
6,Adrian
7,Adrian
8,Adrian
9,Adrian


In [85]:
## We can also add a function to generate our index list 
my_dict ={
    'Name':['Adrian']
}



my_df = pd.DataFrame(my_dict,index=list(np.arange(1,9)))

my_df

## by default an index is added, starting a position 0 and will match how many items we have, or we will get items * number of
## declared indexes


Unnamed: 0,Name
1,Adrian
2,Adrian
3,Adrian
4,Adrian
5,Adrian
6,Adrian
7,Adrian
8,Adrian


## 3 - Create a DataFrame from a Series of Dict

In [93]:
d = {
    'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
    'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])
    }

pd.DataFrame(d)




Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


## 4 - Create a DataFrame from a List of Dict

In [94]:
my_data = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

pd.DataFrame(my_data)

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0
