# Import libraries

In [1]:
from datetime import datetime

import numpy as np
import pandas as pd

# Pandas objects

At the very basic level, Pandas objects can be thought of as an enhanced versions of `NumPy` structured arrays in which the rows and columns are identified with _labels_ rather than simple integer indices

The three fundamental data structures in `pandas` are: `Series`, `DataFrame`, and `Index`. We'll dive into each one in more details in the subsequent sections.

## `Series`:

A Pandas `Series` is a one-dimensional array of **indexed** data. It can be created from a `list`, a `dictionary`, or a `NumPy` array.

It can be thought of as a **one-dimensional** `NumPy` array of **homogeneous** values, accompanied by _labelded_ axis and a _name_.

The following figure illustrates the difference between a `NumPy` array and a `pd.Series`:

<div>
    <img src="img/series-vs-np-array.png" alt='numpy-array-vs-pandas-series' width="800"/>
</div>

The essential difference is the presence of the index. While the `NumPy` array has an _implicitly_ defined integer index used to access the values, the Pandas `Series` has an _explicitly_ defined index associated with the values.

We'll talk about the advateges of the `Index` in a minute.

In [2]:
example_series = pd.Series(data=[10, 20, 30])

In [3]:
example_series

0    10
1    20
2    30
dtype: int64

In [4]:
print(f"example_series.values: {example_series.values}")
print(f"example_series.index: {example_series.index}")
print(f"example_series.shape: {example_series.shape}")
print(f"example_series.dtype: {example_series.dtype}")

example_series.values: [10 20 30]
example_series.index: RangeIndex(start=0, stop=3, step=1)
example_series.shape: (3,)
example_series.dtype: int64


In [5]:
example_np_array = np.array([10, 20, 30])

In [6]:
# values attribute (the underlying data of series)
print(f"example_np_array: {example_np_array}")
# index attribute
print(f"example_np_array.shape: {example_np_array.shape}")
print(f"example_np_array.dtype: {example_np_array.dtype}")

example_np_array: [10 20 30]
example_np_array.shape: (3,)
example_np_array.dtype: int32


As we can see, the `Series` wraps both a sequence of `values` and a sequence of `indices`. In this example, the default index was used as it wasn't provided for the `Series` constructor.

Similar to `NumPy` arrays, we can access `Series` data:

In [7]:
print(f"first element in example_series: {example_series[0]}")

first element in example_series: 10


In [8]:
print(f"first two elements in example_series: \n{example_series[:2]}")

first two elements in example_series: 
0    10
1    20
dtype: int64


In [9]:
student_grades_series = pd.Series(
    data=[90, 90, 73, 57, 91], index=["Sami", "Ahmed", "Qusai", "Saeed", "Yamen"]
)

In [10]:
student_grades_series

Sami     90
Ahmed    90
Qusai    73
Saeed    57
Yamen    91
dtype: int64

In [11]:
student_grades_series["Sami"]

90

In [12]:
student_grades_series["Qusai":]

Qusai    73
Saeed    57
Yamen    91
dtype: int64

In [13]:
date_format = "%Y-%m-%d"

In [14]:
temperatures_list = [
    33,
    36,
    39,
    41,
    42,
    40,
    38
]

In [15]:
dates_list = [
    datetime.strptime("2020-09-01", date_format),
    datetime.strptime("2020-09-02", date_format),
    datetime.strptime("2020-09-03", date_format),
    datetime.strptime("2020-09-04", date_format),
    datetime.strptime("2020-09-05", date_format),
    datetime.strptime("2020-09-06", date_format),
    datetime.strptime("2020-09-07", date_format),
]

In [16]:
daily_temperature_series = pd.Series(
    data=temperatures_list,
    index=dates_list
)

In [None]:
daily_temperature_series.index

In [None]:
type(student_grades.index)

In [None]:
type(example_series.index)

The `data` argument passed to the `pd.Series` must be **one-dimensional** array. Otherwise, exception is raised:

In [None]:
arr = np.arange(4).reshape((2, 2))

In [None]:
pd.Series(arr)

# `DataFrame`:

In [None]:
syria_governorate_population_dict = {
    "Aleppo Governorate": 4600166,
    "Raqqa Governorate": 919000,
    "As-Suwayda Governorate": 364000,
    "Damascus Governorate": 2211042,
    "Daraa Governorate": 998000,
    "Deir ez-Zor Governorate": 1200500,
    "Hama Governorate": 1593000,
    "Hasaka Governorate": 1272702,
    "Homs Governorate": 1762500,
    "Idlib Governorate": 1464000,
    "Latakia Governorate": 1278486,
    "Quneitra Governorate": 87000,
    "Rif Dimashq Governorate": 2831738,
    "Tartus Governorate": 785000,
}

syria_governorate_area_dict = {
    "Aleppo Governorate": 18482,
    "Raqqa Governorate": 19616,
    "As-Suwayda Governorate": 5550,
    "Damascus Governorate": 1599,
    "Daraa Governorate": 3730,
    "Deir ez-Zor Governorate": 33060,
    "Hama Governorate": 8883,
    "Hasaka Governorate": 23334,
    "Homs Governorate": 42223,
    "Idlib Governorate": 6097,
    "Latakia Governorate": 2297,
    "Quneitra Governorate": 1861,
    "Rif Dimashq Governorate": 18032,
    "Tartus Governorate": 1892,
}

In [None]:
pd.DataFrame([syria_governorate_population_dict, syria_governorate_area_dict])

# `pd.read_html` example:

In [None]:
syria_governorates_df = pd.read_html(
    "https://en.wikipedia.org/wiki/Governorates_of_Syria", match="Governorate name"
)[0]