# Basics

---

This will be a brief introduction to **Pandas**, data analysis library in Python. More to go in the second lecture. However, during this quick intro to Pandas, we will cover one of the most important aspect in data analysis - **how to read and write different data** along with checking data shape and size.


### Lecture outline

---

* Read Data


* Write Data


* Data Size and Shape


* Summary Statistics


* Unique Observations


* Value Counts

#### Reference


[IO tools (text, CSV, HDF5, …)](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#sql-queries)

[Pandas: How to Read and Write Files](https://realpython.com/pandas-read-write-files/)

In [None]:
import pandas as pd

## Read Data

---

Reading data file is THE FIRST operation you will do during data analysis. You have to be able to read different format of data file. Let start with the simplest common one, the CSV file.

In [None]:
csv_file = pd.read_csv("data/admission.csv")

csv_file.head()

In [None]:
excel_file = pd.read_excel("data/titanic.xlsx")

excel_file.head()

In [None]:
stata_file = pd.read_stata("data/airline.dta")

stata_file.head()

In [None]:
sas_file = pd.read_sas("data/alcohol.sas7bdat")

sas_file.head()

In [None]:
spss_file = pd.read_spss("data/sleep.sav")

spss_file.head()

In [None]:
json_file = pd.read_json("data/example.json")

json_file.head()

## Write Data

---

Writing data into a file is almost same procedure as reading. While writing data in a file, we have to indicate the address of new data file as well as the name of the file. Let write CSV and Excel file. Other formats are almost same.

In [None]:
csv_file.to_csv("data/new_csv_file.csv")

In [None]:
excel_file.to_excel("data/new_excel_file.xlsx")

## Data Size and Shape

---

We will talk about data size and shape in the second lecture. However, here we quickly cover what is it and how to use that information.

In [None]:
csv_file.size # Returns number of elements in DataFrame

In [None]:
csv_file.shape # Retruns number of rows and columns, respectively

## Summary Statistics

---

This is a summary statistics of your data. This gives you the quick sight of your data at hand.

In [None]:
csv_file.describe()

## Unique Observations

---

Summary statistics does not give how many unique observations we have alongside columns. We can check it by using `.unique()` method.

In [None]:
csv_file.head()

In [None]:
csv_file["Research"].unique() # Only two unique values in "Research" column

## Value Counts

---

We can count duplicated values across columns.

In [None]:
csv_file.head()

In [None]:
csv_file["Research"].value_counts() # We have 219 ones and 181 zeros, totaling to 400

# Summary

---

In this lecture, we learn how to set up our working environment as well as how to install necessary libraries for data analysis. Moreover, we have covered one of the most important aspect of data analysis - data reading and writing. In the next lecture, we will uncover `Pandas` capabilities.