## **Pandas** - Panel Data | Python Data Analysis

**Pandas** *(styled as **pandas**)* is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals, as well as a play on the phrase "Python data analysis". Wes McKinney started building what would become Pandas at AQR Capital while he was a researcher there from 2007 to 2010.

The development of Pandas introduced into Python many comparable features of working with DataFrames that were established in the R programming language. **The library is built upon another library, NumPy.** [pandas-wikipedia](https://en.wikipedia.org/wiki/Pandas_(software))

---
### Resources
* [Official Page](https://pandas.pydata.org/)
* [Official Docs](https://pandas.pydata.org/docs/)
* Lecture Notes *(refer to ../LectureNotes(HKUST)/13-pandas.pdf)*

### Pandas Series

A Pandas Series is a one-dimensional **labeled array** in the Pandas library for Python. It is a fundamental data structure in Pandas and can be thought of as a single column of data in a spreadsheet or a single column within a Pandas DataFrame. 

p.s. **labeled array** means apart from using [x] to index, we can use ['name'] to do the same thing.

In [1]:
import pandas
import numpy

# Let's create a pandas series out of a numpy array
data:numpy.ndarray = numpy.array(object = ['LI', 'Hantang', 'Male', 19])
series = pandas.Series(data = data, index = ['family_name', 'given_name', 'gender', 'age'])

# Let's print the series to have a peek into it
print("--------- Our First Series ---------")
print(series)

--------- Our First Series ---------
family_name         LI
given_name     Hantang
gender            Male
age                 19
dtype: object


In [2]:
# Indexing
print("------------- Indexing -------------")
print(f"Your last name is: {series.loc['family_name']} (using named index)")
print(f"Your first name is: {series.iloc[1]} (using numerical index)")
"""
Note that:
1. <series object name>.loc[] needs to fill in your NAMED index;
2. <series object name>.iloc[] needs to fill in the numerical index (just like an normal array).
"""
pass

------------- Indexing -------------
Your last name is: LI (using named index)
Your first name is: Hantang (using numerical index)


In [3]:
# Slicing (just like Python list, tuple and numpy ndarray)
print("------------- Slicing --------------")
name_series = series.loc['family_name' : 'given_name']
print(f"Series containing your name:\n{name_series}")
name_series.iloc[0] = "LIAN"
name_series.iloc[1] = "TANG"
print(f"Your 'real?' info:\n{series}")
print(f"Modified info:\n{name_series}") # meaning that, it's also a view 
"""
!!! Note that, unlike list and ndarray, 
When using .loc BOTH the starting and stopping indices are included in
the slice.
.iloc behaves like NumPy arrays and lists: specify the
start position (included) and the end position (excluded).
"""
pass

------------- Slicing --------------
Series containing your name:
family_name         LI
given_name     Hantang
dtype: object
Your 'real?' info:
family_name    LIAN
given_name     TANG
gender         Male
age              19
dtype: object
Modified info:
family_name    LIAN
given_name     TANG
dtype: object


In [4]:
# Masking
print("------------- Masking --------------")
numerical = pandas.Series(data = [1, 2, 3, 4, 5, 6]) # if no named index specified, 0, 1, ... will be used.
mask = numerical > 3
print(f"Mask:\n{mask}")
print(f"Masked Series:\n{numerical[mask]}") # we don't use .loc[] and .iloc[] here

------------- Masking --------------
Mask:
0    False
1    False
2    False
3     True
4     True
5     True
dtype: bool
Masked Series:
3    4
4    5
5    6
dtype: int64


### Pandas Dataframe

A DataFrame is a powerful 2-dimensional data structure in Pandas, similar to a spreadsheet or SQL table.

It consists of **rows and columns**, where **each column is a Series object** that can hold different data types but shares the same index.

Each column in a DataFrame has a unique name, which allows for easy access and manipulation of the data.

DataFrames are ideal for handling structured data and performing complex data analysis and manipulation tasks.

#### Create a DataFrame from Scratch

*Parameters: (Constructor of object)*
```python
pandas.DataFrame
```
> **data** : *ndarray (structured or homogeneous), Iterable, dict, or DataFrame*
>     Dict can contain Series, arrays, constants, dataclass or list-like objects. If
>     data is a dict, column order follows insertion-order. If a dict contains Series
>     which have an index defined, it is aligned by its index. This alignment also
>     occurs if data is a Series or a DataFrame itself. Alignment is done on
>     Series/DataFrame inputs.
>     If data is a list of dicts, column order follows insertion-order.
>
> **index** : *Index or array-like*
>     Index to use for resulting frame. Will default to RangeIndex if
>     no indexing information part of input data and no index provided.
>
> **columns** : *Index or array-like*
>     Column labels to use for resulting frame when data does not have them,
>     defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,
>     will perform column selection instead.
>
> **dtype** : *dtype, default None*
>     Data type to force. Only a single dtype is allowed. If None, infer.
>
> **copy** : *bool or None, default None*
>     Copy data from inputs.
>     For dict data, the default of None behaves like ``copy=True``.  For DataFrame
>     or 2d ndarray input, the default of None behaves like ``copy=False``.
>     If data is a dict containing one or more Series (possibly of different dtypes),
>     ``copy=False`` will ensure that these inputs are not copied.

In [5]:
import pandas

# some data first
mr_candy = pandas.Series(data = ['LEE', 'Hantang', 'Male', 19], \
                         index = ['family_name', 'given_name', 'gender', 'age'])
mr_joggy = pandas.Series(data = ['WONG', 'Zhengyang', 'Male', 20], \
                         index = ['family_name', 'given_name', 'gender', 'age'])
ms_misty = pandas.Series(data = ['N/a', 'カスミ', 'Female', 10], \
                         index = ['family_name', 'given_name', 'gender', 'age'])

# let's create a dataframe from scratch
dataframe = pandas.DataFrame(data = [mr_candy, mr_joggy, ms_misty])

# let's visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age
0,LEE,Hantang,Male,19
1,WONG,Zhengyang,Male,20
2,N/a,カスミ,Female,10


In [6]:
import pandas

# you can also insert by column (previously, row, or a single entry)
family_names = pandas.Series(data = ['LEE', 'WONG', 'N/a'])
given_names  = pandas.Series(data = ['Hantang', 'Zhengyang', 'カスミ'])
genders      = pandas.Series(data = ['Male', 'Male', 'Female'])
ages         = pandas.Series(data = [19, 20, 10])

dataframe = pandas.DataFrame(data = \
                             {'family_name' : family_names,
                              'given_name' : given_names,
                              'gender' : genders,
                              'age' : ages})

# visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age
0,LEE,Hantang,Male,19
1,WONG,Zhengyang,Male,20
2,N/a,カスミ,Female,10


In [7]:
import pandas

# You can also create a dataframe with missing fields
# Let's use the second format (column insertion)
family_names = pandas.Series(data = ['LEE', 'WONG'], index = ['person1', 'person2'])
given_names  = pandas.Series(data = ['Hantang', 'Zhengyang', 'カスミ'], index = ['person1', 'person2', 'person3'])
genders      = pandas.Series(data = ['Male', 'Female'], index = ['person2', 'person3'])
ages         = pandas.Series(data = [19, 20, 10], index = ['person1', 'person2', 'person3'])

dataframe = pandas.DataFrame(data = \
                             {'family_name' : family_names,
                              'given_name' : given_names,
                              'gender' : genders,
                              'age' : ages})

# visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age
person1,LEE,Hantang,,19
person2,WONG,Zhengyang,Male,20
person3,,カスミ,Female,10


_Note:_
* 'NaN' in the printed datafram symbolizes the missing field;
* We can also use [Index] other than numerical sequences (i.e. 'person1', 'person2' ...)

#### * Drop 'NaN's

```python
pandas.DataFrame.dropna()
```
**axis** : *{0 or 'index', 1 or 'columns'}*, default 0
Determine if rows or columns which contain missing values are removed.

> 0, or 'index' ⁠:⁠ Drop rows which contain missing values.
>
> 1, or 'columns' ⁠:⁠ Drop columns which contain missing value.
> Only a single axis is allowed.

**how** : *{'any', 'all'}*, default 'any'
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

> 'any' ⁠:⁠ If any NA values are present, drop that row or column.
>
> 'all' ⁠:⁠ If all values are NA, drop that row or column.

**inplace** : *bool*, default False
Whether to modify the DataFrame rather than creating a new one.

> This line of code will directly modify 'dataframe'
> ```python
> dataframe.dropna(axis = 0, how = 'any', inplace = True)
> ```
> While this will NOT (it returns a new modified DataFrame object)
> ```python
> <assign to a new object> = dataframe.dropna(axis = 0, how = 'any', inplace = False)
> ```

In [8]:
"""
You must run the previous cell first. (To get 'dataframe' initialized.)
"""

# You can use .dropna() to remove rows or columns with NaN
dataframe.dropna(axis = 0, how = 'any', inplace = True)

# visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age
person2,WONG,Zhengyang,Male,20


#### Indexing a DataFrame

In other words, 'accessing' a DataFrame

In [9]:
import pandas

# Again, some data first
mr_candy = pandas.Series(data = ['LEE', 'Hantang', 'Male', 19], \
                         index = ['family_name', 'given_name', 'gender', 'age'])
mr_joggy = pandas.Series(data = ['WONG', 'Zhengyang', 'Male', 20], \
                         index = ['family_name', 'given_name', 'gender', 'age'])
ms_misty = pandas.Series(data = ['カスミ', 'Female', 10], \
                         index = ['given_name', 'gender', 'age'])   # in the format, for missing fields, simply ignore that index

# And a dataframe
dataframe = pandas.DataFrame(data = [mr_candy, mr_joggy, ms_misty], index = ['mr_candy', 'mr_joggy', 'ms_misty'])   # you can fill in the indexes here

# let's visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age
mr_candy,LEE,Hantang,Male,19
mr_joggy,WONG,Zhengyang,Male,20
ms_misty,,カスミ,Female,10


You can access a column of a DataFrame by specifying the column name in square
brackets [ ].

It returns a `pandas.Series` with the selected column.

In [10]:
"""
You must run the previous cell first. (To get 'dataframe' correctly set.)
"""

# obtain all family names (whole column)
retrieved_family_names = dataframe['family_name']

# visualize
print(retrieved_family_names)

# Noticed that the 'NaN' is an element in that Series

mr_candy     LEE
mr_joggy    WONG
ms_misty     NaN
Name: family_name, dtype: object


You can access a single DataFrame row using the same methods as for Series:
- .loc for label-based indexing
- .iloc for position-based indexing

It returns a `pandas.Series` with an element for each column.
The index contains the names of the columns.

In [11]:
# obtain info for a single person (while row)
retrieved_mr_candy = dataframe.loc['mr_candy']
retrieved_ms_misty = dataframe.iloc[2]

# visualize
print("-------- retrieved_mr_candy --------")
print(retrieved_mr_candy)
print("-------- retrieved_ms_misty --------")
print(retrieved_ms_misty)

-------- retrieved_mr_candy --------
family_name        LEE
given_name     Hantang
gender            Male
age                 19
Name: mr_candy, dtype: object
-------- retrieved_ms_misty --------
family_name       NaN
given_name        カスミ
gender         Female
age                10
Name: ms_misty, dtype: object


You can access DataFrames with slicing by selecting rows and/or columns.

You **cannot** mix position-based and label-based indexing.

```python
dataframe.loc[ <row>, <col> ]
```

In [12]:
# obtain some entries from the dataframe
retrieved_partial_dataframe = dataframe.loc[ 'mr_candy' : 'mr_joggy' , 'family_name' : 'given_name' ]

# visualize
retrieved_partial_dataframe

Unnamed: 0,family_name,given_name
mr_candy,LEE,Hantang
mr_joggy,WONG,Zhengyang


You can also use masking to select rows based on a condition.

You can combine masking with slicing.

You have to specify a mask to select the rows based on a condition and then slice to
select only the same columns.

In [13]:
# Create a mask
mask = \
    (dataframe['age'] >= 18 ) & \
    (dataframe['family_name'] != 'WONG')

# visualize mask
mask

mr_candy     True
mr_joggy    False
ms_misty    False
dtype: bool

* _Note: operator '&' used here is 'AND'_
- _True & True == True_
- _True & False == False_
- _False & True == False_
- _False & False == False_

* _For 'OR' operation, use '|'._

In [14]:
dataframe[mask]

Unnamed: 0,family_name,given_name,gender,age
mr_candy,LEE,Hantang,Male,19


In [15]:
dataframe.loc[ mask, 'family_name' : 'given_name' ]

Unnamed: 0,family_name,given_name
mr_candy,LEE,Hantang


* _Note: The first one worked without using .loc[ ], doesn't mean the second one can work without it._

* _Note: As masking is a different kind of 'selecting', the returned object is a DataFrame, not a Series._

#### Adding rows or columns

In [16]:
import pandas

# Again, some data first
mr_candy = pandas.Series(data = ['LEE', 'Hantang', 'Male', 19], \
                         index = ['family_name', 'given_name', 'gender', 'age'])
mr_joggy = pandas.Series(data = ['WONG', 'Zhengyang', 'Male', 20], \
                         index = ['family_name', 'given_name', 'gender', 'age'])
ms_misty = pandas.Series(data = ['カスミ', 'Female', 10], \
                         index = ['given_name', 'gender', 'age'])   # in the format, for missing fields, simply ignore that index

# And a dataframe
dataframe = pandas.DataFrame(data = [mr_candy, mr_joggy, ms_misty], index = ['mr_candy', 'mr_joggy', 'ms_misty'])   # you can fill in the indexes here

# let's visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age
mr_candy,LEE,Hantang,Male,19
mr_joggy,WONG,Zhengyang,Male,20
ms_misty,,カスミ,Female,10


```python
pandas.DataFrame.insert()
```

**loc** : *int*
Insertion index. Must verify 0 <= loc <= len(columns).

**column** : *str, number, or hashable object*
Label of the inserted column.

**value** : *Scalar, Series, or array-like*
Content of the inserted column.

**allow_duplicates** : *bool*, optional, default lib.no_default
Allow duplicate column labels to be created.

In [17]:
# Prepare data for a new column
interest = pandas.Series(data = ['chemistry', 'anatomy', 'Pokémon'], index = ['mr_candy', 'mr_joggy', 'ms_misty'])

# Adding a column
dataframe.insert(loc = len(dataframe.columns), column = 'interest', value = interest)

# visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age,interest
mr_candy,LEE,Hantang,Male,19,chemistry
mr_joggy,WONG,Zhengyang,Male,20,anatomy
ms_misty,,カスミ,Female,10,Pokémon


* _Note: `len(dataframe.columns)` gives the total count of the columns._

In [18]:
# You can also use the following format

# Prepare data for a new column
happiness = pandas.Series(data = [100, 50, 9999], index = ['mr_candy', 'mr_joggy', 'ms_misty'])

# Adding a column
dataframe['happiness'] = happiness

# visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age,interest,happiness
mr_candy,LEE,Hantang,Male,19,chemistry,100
mr_joggy,WONG,Zhengyang,Male,20,anatomy,50
ms_misty,,カスミ,Female,10,Pokémon,9999


Adding a row is a bit harder.

- First option: use `dataframe.loc[<new key>] = <new data>` *if old key is provided, data found with that old key will be overwitten*
- Second option: use `pandas.concat()`

In [19]:
# Prepare a new row (entry)
mr_james = pandas.Series(data = ['Biden', 'James', 'Male', 100, 'mystery', 0], \
                         index = ['family_name', 'given_name', 'gender', 'age', 'interest', 'happiness'])

# Insert
dataframe.loc['mr_james'] = mr_james

# visualize
dataframe

# Note that you cannot use dataframe.iloc[len(dataframe)] to insert
# -> iloc[X] X MUST be in range [ 0, len(dataframe) )

Unnamed: 0,family_name,given_name,gender,age,interest,happiness
mr_candy,LEE,Hantang,Male,19,chemistry,100
mr_joggy,WONG,Zhengyang,Male,20,anatomy,50
ms_misty,,カスミ,Female,10,Pokémon,9999
mr_james,Biden,James,Male,100,mystery,0


In [20]:
# Prepare a new row (entry)
mr_tomas = pandas.Series(data = ['Harris', 'Tomas', 'Male', 85, 'mystery', 20], \
                         index = ['family_name', 'given_name', 'gender', 'age', 'interest', 'happiness'])

# Create a separate dataframe (as .concat only works for two dataframes)
temp_dataframe = pandas.DataFrame(data = [mr_tomas], index = ['mr_tomas'])
"""
!!! Make sure you do not use:
    temp_dataframe = pandas.DataFrame(data = mr_tomas, index = ['mr_tomas'])
    Or 'mr_tomas' will be processed wrongly: (as shown below)
                0
    mr_tomas  NaN
"""

# visualize the temp dataframe
temp_dataframe

Unnamed: 0,family_name,given_name,gender,age,interest,happiness
mr_tomas,Harris,Tomas,Male,85,mystery,20


In [21]:
# now concatenate two dataframes
dataframe = pandas.concat((dataframe, temp_dataframe), axis = 0, ignore_index = False, copy = True)

# you can delete the temp if you want
del temp_dataframe

# visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age,interest,happiness
mr_candy,LEE,Hantang,Male,19,chemistry,100
mr_joggy,WONG,Zhengyang,Male,20,anatomy,50
ms_misty,,カスミ,Female,10,Pokémon,9999
mr_james,Biden,James,Male,100,mystery,0
mr_tomas,Harris,Tomas,Male,85,mystery,20


#### Arithmetic Operations

Actually, we should have done this when introducing pandas.Series -> basically, we are going to extract a column (Series) from the data frame and perform some operations.

In [22]:
"""
Make sure you have run all previous code blocks ONCE.
"""

# let's extract the column of 'happiness'
happiness:pandas.Series = dataframe['happiness']

# let's visualize 'happiness' first
happiness

mr_candy     100
mr_joggy      50
ms_misty    9999
mr_james       0
mr_tomas      20
Name: happiness, dtype: int64

In [23]:
# now, let's try to normalize the happiness data we've got to [0, 1]
max_happiness:int = happiness.max()
min_happiness:int = happiness.min()

# we normalize
happiness = (happiness - min_happiness) / (max_happiness - min_happiness)

# visualize
happiness

mr_candy    0.010001
mr_joggy    0.005001
ms_misty    1.000000
mr_james    0.000000
mr_tomas    0.002000
Name: happiness, dtype: float64

In [24]:
# noticed that our dataframe has NOT been altered
# dataframe['happiness'] returns a COPY
dataframe

Unnamed: 0,family_name,given_name,gender,age,interest,happiness
mr_candy,LEE,Hantang,Male,19,chemistry,100
mr_joggy,WONG,Zhengyang,Male,20,anatomy,50
ms_misty,,カスミ,Female,10,Pokémon,9999
mr_james,Biden,James,Male,100,mystery,0
mr_tomas,Harris,Tomas,Male,85,mystery,20


In [25]:
# let's assign back to the dataframe to actually change its values
dataframe['happiness'] = happiness

# visualize
dataframe

Unnamed: 0,family_name,given_name,gender,age,interest,happiness
mr_candy,LEE,Hantang,Male,19,chemistry,0.010001
mr_joggy,WONG,Zhengyang,Male,20,anatomy,0.005001
ms_misty,,カスミ,Female,10,Pokémon,1.0
mr_james,Biden,James,Male,100,mystery,0.0
mr_tomas,Harris,Tomas,Male,85,mystery,0.002


for this line of code:
```python
# we normalize
happiness = (happiness - min_happiness) / (max_happiness - min_happiness)
```
we noticed '+' and '/' is performed on all elements of the Series happiness
-> You're right! That's what it's supposed to be。

Now, how to perform value-by-value operations?
-> We need to create a new Series with the same key as mappings.

In [26]:
# let's retrieve 'happiness' again
happiness:pandas.Series = dataframe['happiness']

# let's create a mapping
mapping:pandas.Series = pandas.Series(data = [1, 10, 100, 1000], index = ['mr_candy', 'mr_joggy', 'ms_misty', 'mr_james'])

# let's perform addition once
happiness += mapping

# visualize
happiness

mr_candy       1.010001
mr_joggy      10.005001
ms_misty     101.000000
mr_james    1000.000000
mr_tomas            NaN
Name: happiness, dtype: float64

\* By the way, I intentionally missed the mapping for 'mr_tomas'. As you can see, mismatched or absent indexes receive a NaN (Not a number) as the result.

#### With 'helpful' NumPy

We can also convert pandas.Series back to `numpy.ndarray` to perform some advanced mathematical operations.

In [27]:
import numpy

# this time, let's use 'age' column
ages:numpy.ndarray = dataframe['age'].to_numpy(dtype = numpy.int8)

# visualize
print("-------- Age as numpy array --------")
print(ages)

# calculate some meaningful attributes using numpy functions
print("--------------- Max ----------------")
print(ages.max())
print("--------------- Min ----------------")
print(ages.min())
print("--------------- Mean ---------------")
print(ages.mean())
print("-------- Standard Deviation --------")
print(ages.std())

# Note: pandas also has max(), min(), mean(), std() built-in ~ WOW ~

print("\n=> Feel free to try anything numpy provides!")

-------- Age as numpy array --------
[ 19  20  10 100  85]
--------------- Max ----------------
100
--------------- Min ----------------
10
--------------- Mean ---------------
46.8
-------- Standard Deviation --------
37.7751240898028

=> Feel free to try anything numpy provides!


### Working with .CSV files

A CSV _(comma-separated values)_ file is a simple text file format for storing tabular data, where each line is a record and values within each line are separated by commas. The .csv extension indicates the file type, which can be opened by spreadsheet programs like Microsoft Excel or Google Sheets, text editors, and other applications for data import and export. CSVs are a common, non-proprietary method for transferring data between different programs and a convenient way to store large datasets in a compact format.

* Save your spreadsheet with Excel as .csv, and, let's begin ...
_(I have already provided you with some samples in ../SampleData folder)_

#### Read a .CSV

```python
# create a dataframe with .csv file
dataframe_from_csv = pandas.read_csv(
    # -> file path
    filepath_or_buffer = "../SampleData/sample_data_0.csv",
    # -> value separate mark, for .csv usually ','
    sep = ',',
    # -> values in .csv to be treated as absent (empty)
    na_values = ["N/a"],
    # -> column to be treated as index (previous example, 'mr_xxx' is the index, here we choose the column named 'id')
    index_col = 'id')
```

In [28]:
import pandas

# create a dataframe with .csv file
dataframe_from_csv = pandas.read_csv(
    filepath_or_buffer = "../SampleData/sample_data_0.csv", 
    sep = ',', 
    na_values = ["N/a"], 
    index_col = 'id')

# visualize our created dataframe
dataframe_from_csv

Unnamed: 0_level_0,compound_name,value_1,value_2,value_3,state,colour
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Fabaceae,998.87073,-2.29873,,+,Fuscia
2,Asteraceae,171.88994,-33.46525,1733.0,-,Teal
3,Brassicaceae,369.56489,52.29384,,-,Maroon
4,Fabaceae,946.71690,-11.75319,1128.0,-,Purple
5,Onagraceae,598.29730,,1588.0,+,Blue
...,...,...,...,...,...,...
996,Parmeliaceae,,-29.04800,1218.0,-,Orange
997,Hippocastanaceae,187.65471,-31.18477,,-,Maroon
998,Asteraceae,340.12488,35.05716,,+,Violet
999,Polygonaceae,5.43529,-7.24339,,-,Crimson


#### Write to a .CSV

You can also save your pandas dataframe as .csv file

-> you use `<dataframe_object>.to_csv()` function

```python
dataframe.to_csv(
    # -> save path
    path_or_buf = "./mr_xxx.csv", 
    # -> field separation symbol, for .csv file, usually ','
    sep = ',',
    # -> what to write for fields with missing value
    na_rep = "N/a",
    # -> whether to write columns' names at the first row of .csv file
    header = True,
    # -> whether to write index column (in this example, whether to write 'mr_xxx' in file)
    index = True,
    # -> Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used
    index_label = "Name")
```

In [None]:
# let's save the dataframe with 'mr_xxx' as .csv file
dataframe.to_csv(
    path_or_buf = "./mr_xxx.csv", 
    sep = ',',
    na_rep = "N/a",
    header = True,
    index = True,
    index_label = "Name")

# you can find the written file with name "mr_xxx.csv" under the same directory of the ipynb notebook.