In this class, you'll learn all about the basics of pandas, the most popular Python library for data analysis.
In this lecture, we include:
1. Creating Data
2. Reading Data
3. Writing Data
___

# Getting Started

To use pandas, we first need to import the pandas library with the following line of code

In [None]:
import pandas as pd

**Note**: If you require help with the initial setup, visit the README section of the repository
___

# Creating Data

There are two core objects in pandas: the **DataFrame** and the **Series**.

## Dataframe

A DataFrame is a table. It contains an array of individual entries. Each entry corresponds to a row (or record) and a column. 

For example:<br>
<img src = Images/Class1/Image1.png width=400 height=400> 

In [None]:
#Pandas code to make a table like above
pd.DataFrame({'Girls':[75, 80], 'Boys':[60, 55]})

We are using the **pd.DataFrame()** constructor to generate these DataFrame objects.<br>The syntax for declaring a new one is a dictionary whose keys are the column names (Girls and Boys in this example), and whose values are a list of entries. <br>This is the standard way of constructing a new DataFrame.<br>

The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels. Sometimes this is fine, but oftentimes we will want to assign these labels ourselves.

The list of row labels used in a DataFrame is known as an Index. We can assign values to it by using an **index** parameter in our constructor:

In [None]:
#Adding indexes
pd.DataFrame({'Girls':[75, 80], 'Boys':[60, 55]}, index=['X grade', 'XII grade'])

**Note**: DataFrame entries are not limited to integers and can include a variety of data types like strings.

#### Assignment 1
Create a Pandas DataFrame that looks like:<br><img src="Images/Class1/Image2.png" width=400 height=400>

In [None]:
#Write your code below:

Solution:

In [None]:
pd.DataFrame({'Husband':['Adam', 'Chris'], 'Wife':['Beth', 'Daisy']})

## Series

A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list.

For example:<br>
<img src = Images/Class1/Image3.png width=300 height=300> 

In [None]:
#Pandas code to make a table like above
pd.Series([1, 2, 3, 4, 5])

You can assign row labels to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name.

In [None]:
#Adding indexes
pd.Series([1, 2, 3, 4, 5], index=['No.1', 'No.2', 'No.3', 'No.4', 'No.5'], name = 'Numbers_Series')#The 'name' parameter is used to give a title to the series

<br>

#### Assignment 2
Create a Pandas Series with a name **'Number of students'** that looks like:<br><img src="Images/Class1/Image4.png" width=300 height=300>

In [None]:
#Write your code below:

Solution:

In [None]:
pd.Series([60, 70, 80, 90, 100], index=['I grade', 'II grade', 'III grade', 'IV grade', 'V grade'], name='Number of students')

**Note:** The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together".
___

# Reading Data Files

Most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

Data can be stored in any of a number of different forms and formats. By far the most basic of these is the **CSV file**. A CSV file is a table of values separated by commas. Hence the name: "Comma-Separated Values", or CSV.

We'll use the **pd.read_csv()** function to read the data into a DataFrame.

In [None]:
wine_reviews = pd.read_csv("Datasets/winemag-data-130k-v2.csv")
#You can download this dataset from: https://www.kaggle.com/datasets/zynicide/wine-reviews. It is also present within the Datasets folder of the repository


## shape()
To determine how large the dataset is, we can use the **shape** attribute

In [None]:
wine_reviews.shape

So our new DataFrame has 129,971 records split across 14 different columns. 


## head()
We can examine the contents of the DataFrame using the **head()** command, which grabs the first five rows.

In [None]:
wine_reviews.head()