## Series

Series objects are 1-D. Each object has its own associated index

In [1]:
import pandas as pd 
import numpy as np

series = pd.Series(dtype="float64")
print(f"{series}\n")

series = pd.Series(5)
print(f"Series(5):\n{series}\n")

series = pd.Series([1, 2, 3])
print(f"Series with arr:\n{series}\n")

series = pd.Series([1, 2.2]) 
print(f"Upcasting:\n{series}\n")

arr = np.array([1, 2])
series = pd.Series(arr, dtype=np.float32)
print(f"Using numpy arr:\n{series}\n")

series = pd.Series([[1, 2], [3, 4]])
print(f"2D series:\n{series}\n")

Series([], dtype: float64)

Series(5):
0    5
dtype: int64

Series with arr:
0    1
1    2
2    3
dtype: int64

Upcasting:
0    1.0
1    2.2
dtype: float64

Using numpy arr:
0    1.0
1    2.0
dtype: float32

2D series:
0    [1, 2]
1    [3, 4]
dtype: object



Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


### Indexing

The default index is integers from 0 to size of elements in series - 1. But, we can also provide our custom index through **index** keyword. The values in **index** list should be a hashable type.

In [2]:
series = pd.Series([1, 2, 3], index=["a", "b", "c"])
print(f"Series with custom index:\n{series}\n")

series = pd.Series([1, 2, 3], index=["a", 8, 0.3])
print(f"Series with new custom index:\n{series}")

Series with custom index:
a    1
b    2
c    3
dtype: int64

Series with new custom index:
a      1
8      2
0.3    3
dtype: int64


We can also pass in a dictionary so that the keys are used as index and the values are used as series objects

In [3]:
series = pd.Series({"a": 1, "b": 2, "c": 3 })
print(f"Series from dictionary:\n{series}\n")


Series from dictionary:
a    1
b    2
c    3
dtype: int64



In [4]:
s1 = pd.Series([1, 2, 3])
s2 = s1 * [10, 20, 30]
print(f"Dot product:\n{s2}\n")

print(f"Scaling:\n{s1 * 10}")

Dot product:
0    10
1    40
2    90
dtype: int64

Scaling:
0    10
1    20
2    30
dtype: int64


## DataFrame

One main purpose of pandas is deal with data from tables or spreadsheets, which contain rows and columns. **pandas.DataFrame** object is used to represent these data. It should be noted that a dataframe cannot be created from a scalar. DataFrame takes in additional *columns* keyword argument.

In [5]:
df = pd.DataFrame()
print(f"{df}\n")

df = pd.DataFrame([5, 6])
print(f"Dataframe with one column:\n{df}\n")

df = pd.DataFrame([[5, 6], [1, 3]], index=["r1", "r2"], columns=["c1", "c2"])
print(f"2x2 Dataframe\n:{df}\n")

df = pd.DataFrame({"c1": [1, 2], "c2": [3, 4]}, index=["r1", "r2"])
print(f"Dataframe from dictionary:\n{df}\n")

Empty DataFrame
Columns: []
Index: []

Dataframe with one column:
   0
0  5
1  6

2x2 Dataframe
:    c1  c2
r1   5   6
r2   1   3

Dataframe from dictionary:
    c1  c2
r1   1   3
r2   2   4



Upcasting occurs on a per-column basis. The **dtypes** property returns the types in each column as a Series of types

In [6]:
upcast = pd.DataFrame([[5, 6], [1.2, 3]])
print(f"Upcasted dataframe:\n{upcast}\n")

print(upcast.dtypes)

Upcasted dataframe:
     0  1
0  5.0  6
1  1.2  3

0    float64
1      int64
dtype: object


## Concatenating rows

**obj.append(other)** is how additional rows are added to a given dataframe. This function returns the modified dataframe and does not change the original. We can append a Series or another dataframe. We can specify the name of the new series being added using *name* argument and by setting *ignore_index* to True, which cahnges the row labels to integer indices

In [7]:
df = pd.DataFrame([[5, 6], [1.2, 3]])
ser = pd.Series([0, 0], name="r3")

new_df = pd.concat([df, ser])
print(f"Dataframe with appended series:\n{new_df}\n")

new_df = pd.concat([df, ser], ignore_index=True)
print(f"Changing indices to integers:\n{new_df}\n")

df1 = pd.DataFrame([[0, 0,], [9, 9]])
new_df = pd.concat([df, df1])
print(f"Appending dataframe to another dataframe:\n{new_df}\n")

Dataframe with appended series:
     0    1   r3
0  5.0  6.0  NaN
1  1.2  3.0  NaN
0  NaN  NaN  0.0
1  NaN  NaN  0.0

Changing indices to integers:
     0    1
0  5.0  6.0
1  1.2  3.0
2  0.0  NaN
3  0.0  NaN

Appending dataframe to another dataframe:
     0  1
0  5.0  6
1  1.2  3
0  0.0  0
1  9.0  9



## Dropping data

The **drop** function is used to drop rows or columns from a given dataframe. There is no required argument, but there is the *lables* keyword to specify the labels of the rows or columns to drop. There is also *axis* keyword, with default value 0, used to drop rows or columns axis.

You can also use *index* or *columns* to specify with labels of the rows or columns to drop directly

In [8]:
df = pd.DataFrame({"c1": [1, 2], "c2": [3, 4], "c3": [5, 6]}, index=["r1", "r2"])
print(f"Original dataframe:\n{df}\n\n")

df_drop = df.drop(labels="r1")
print(f"r1 dropped:\n{df_drop}\n\n")

df_drop = df.drop(labels=["c1", "c3"], axis=1)
print(f"Droping c1 and c3:\n{df_drop}\n\n")

df_drop = df.drop(index="r2")
print(f"Dropping row 2:\n{df_drop}\n\n")

df_drop = df.drop(columns="c2")
print(f"Dropping column 2:\n{df_drop}\n\n")

df.drop(index="r2", columns="c2")
print(f"Original dataframe still remains intact:\n{df}\n\n")

Original dataframe:
    c1  c2  c3
r1   1   3   5
r2   2   4   6


r1 dropped:
    c1  c2  c3
r2   2   4   6


Droping c1 and c3:
    c2
r1   3
r2   4


Dropping row 2:
    c1  c2  c3
r1   1   3   5


Dropping column 2:
    c1  c3
r1   1   5
r2   2   6


Original dataframe still remains intact:
    c1  c2  c3
r1   1   3   5
r2   2   4   6




## Merging

We can also merge two dataframes using **pd.merge** function. This function takes in two DataFrames objects for its two required arguments. **pd.merge** joins two DataFrames using all their common column labels if no keyword is provided.

In [9]:
mlb_df1 = pd.DataFrame({"name": ["john doe", "ross mike", "sam blue", "jane doe"], 
                        "pos": ["1B", "C", "P", "2B"],
                        "year": [2000 + i for i in range(4)]})
mlb_df2 = pd.DataFrame({"name": ["john doe", "ross mike", "jack chan"],
                       "year": [2000, 2001, 2005],
                       "rb1" : [80, 100, 12]})
for df in [mlb_df1, mlb_df2]:
  print(f"{df}\n")

mlb_merged = pd.merge(mlb_df1, mlb_df2)
print(f"{mlb_merged}\n")

        name pos  year
0   john doe  1B  2000
1  ross mike   C  2001
2   sam blue   P  2002
3   jane doe  2B  2003

        name  year  rb1
0   john doe  2000   80
1  ross mike  2001  100
2  jack chan  2005   12

        name pos  year  rb1
0   john doe  1B  2000   80
1  ross mike   C  2001  100



## Indexing

When we index into a DataFrame, we can treat the DataFrame as a dictionary of Series object, where each column represents a Series. Each column label then becomes a key.

In [12]:
df = pd.DataFrame({"c1": [1, 2], "c2": [3, 4], 
                   "c3": [5, 6]}, index=["r1", "r2"])


print(f"Column 1:\n{df['c1']}\n")
print(f"Column 2:\n{df['c2']}\n")
print(f"Columns 2 and 3:\n{df[['c2', 'c3']]}\n")

Column 1:
r1    1
r2    2
Name: c1, dtype: int64

Column 2:
r1    3
r2    4
Name: c2, dtype: int64

Columns 2 and 3:
    c2  c3
r1   3   5
r2   4   6



We can use direct indexing to also get a subset of the rows as a DataFrame. Rows, however, can only be retrieved based on slices. 

In [14]:
df = pd.DataFrame({"c1": [1, 2, 3], "c2": [4, 5, 6], 
                   "c3": [7, 8, 9], }, index=["r1", "r2", "r3"])

print(f"Original Dataframe:\n{df}\n")
print(f"First two rows:\n{df[0:2]}\n")
print(f"Last two rows:\n{df['r2': 'r3']}\n")


Original Dataframe:
    c1  c2  c3
r1   1   4   7
r2   2   5   8
r3   3   6   9

First two rows:
    c1  c2  c3
r1   1   4   7
r2   2   5   8

Last two rows:
    c1  c2  c3
r2   2   5   8
r3   3   6   9



There are other indexing that can be used to get DataFrame objects. There are **loc** and **iloc** properties for indexing. **iloc** is used to access rows based on their integer index. 

In [15]:
print(f"Original Dataframe:\n{df}\n")
print(f"df.iloc[1]:\n{df.iloc[1]}\n")
print(f"df.iloc[[0, 2]]:\n{df.iloc[[0, 2]]}\n")
print(f"Using boolean list [False, True, True]:\n{df.iloc[[False, True, True]]}\n")

Original Dataframe:
    c1  c2  c3
r1   1   4   7
r2   2   5   8
r3   3   6   9

df.iloc[1]:
c1    2
c2    5
c3    8
Name: r2, dtype: int64

df.iloc[[0, 2]]:
    c1  c2  c3
r1   1   4   7
r3   3   6   9

Using boolean list [False, True, True]:
    c1  c2  c3
r2   2   5   8
r3   3   6   9



The **loc** property has same row indexing functionality as **iloc**. It however uses row labels rather than integer indices. Again, **loc** can be used to perform column indexing along with row indexing and set new values in a DataFrame for specific rows and columns

In [16]:
print(f"Original DataFrame:\n{df}\n")
print(f"df.loc['r2']:\n{df.loc['r2']}\n")
print(f"Using boolean list:\n{df.loc[[False, True, True]]}\n")

print(f"Getting a single value:\n{df.loc['r1', 'c2']}\n")
print(f"Different rows and a column:\n{df.loc[['r1', 'r3'], 'c2']}\n")

# Setting a value 
df.loc[['r1', 'r3'], 'c2'] = 0
print(f"Original DataFrame:\n{df}\n")

Original DataFrame:
    c1  c2  c3
r1   1   4   7
r2   2   5   8
r3   3   6   9

df.loc['r2']:
c1    2
c2    5
c3    8
Name: r2, dtype: int64

Using boolean list:
    c1  c2  c3
r2   2   5   8
r3   3   6   9

Getting a single value:
4

Different rows and a column:
r1    4
r3    6
Name: c2, dtype: int64

Original DataFrame:
    c1  c2  c3
r1   1   0   7
r2   2   5   8
r3   3   0   9

