## THIS IS A DEMONSTRATION OF HOW PANDAS IS USED

To load the pandas package and start working with it, import the package. The community agreed alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the pandas documentation.

In [27]:
import pandas as pd

I want to store passenger data of the Titanic. For a number of passengers, I know the name (characters), age (integers) and sex (male/female) data.

In [28]:
df = pd.DataFrame(
   ...:     {
   ...:         "Name": [
   ...:             "Braund, Mr. Owen Harris",
   ...:             "Allen, Mr. William Henry",
   ...:             "Bonnell, Miss. Elizabeth",
   ...:         ],
   ...:         "Age": [22, 35, 58],
   ...:         "Sex": ["male", "male", "female"],
   ...:     }
   ...: )
   ...: 

df

Unnamed: 0,Name,Age,Sex
0,"Braund, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnell, Miss. Elizabeth",58,female


To manually store data in a table, create a DataFrame. When using a Python dictionary of lists, the dictionary keys will be used as column headers and the values in each list as columns of the DataFrame.
A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R.

The table has 3 columns, each of them with a column label. The column labels are respectively Name, Age and Sex.
The column Name consists of textual data with each value a string, the column Age are numbers and the column Sex is textual data.

## Each column in a DataFrame is a `Series`

I’m just interested in working with the data in the column Age

In [29]:
df["Age"]

0    22
1    35
2    58
Name: Age, dtype: int64

When selecting a single column of a pandas DataFrame, the result is a pandas Series. To select the column, use the column label in between square brackets [].

If you are familiar to Python dictionaries, the selection of a single column is very similar to selection of dictionary values based on the key.

You can create a `Series` from scratch as well:

A pandas `Series` has no column labels, as it is just a single column of a DataFrame. A `Series` does have row labels.

In [30]:
ages = pd.Series([22, 35, 58], name="Age")

ages


0    22
1    35
2    58
Name: Age, dtype: int64

## Do something with a DataFrame or Series

I want to know the maximum Age of the passengers

We can do this on the `DataFrame` by selecting the `Age` column and applying `max()`:

In [31]:
df["Age"].max()

58

We can also use the following `Series`

In [32]:
ages.max()

58

As illustrated by the max() method, you can do things with a DataFrame or Series. pandas provides a lot of functionalities, each of them a method you can apply to a DataFrame or Series. As methods are functions, do not forget to use parentheses ().

Consequently, in relation to a `DataFrame` or `Series`, `max()` is a **method**

Another example computation but from gradescope is given below: 

In [33]:
ages + 1

0    23
1    36
2    59
Name: Age, dtype: int64

## I’m interested in some basic statistics of the numerical data of my data table

In [34]:
df.describe()

Unnamed: 0,Age
count,3.0
mean,38.333333
std,18.230012
min,22.0
25%,28.5
50%,35.0
75%,46.5
max,58.0


The `describe()` method provides a quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by the `describe()` method.
Many pandas operations return a `DataFrame` or a `Series`. The `describe()` method is an example of a `pandas` operation returning a pandas `Series` or a pandas `DataFrame`.

[Link to check more options on describe in the user guide section about aggregations with describe](https://pandas.pydata.org/docs/user_guide/basics.html#basics-describe)

The above cells just describe a starting point. Similar to spreadsheet software, pandas represents data as a table with columns and rows. Apart from the representation, also the data manipulations and calculations you would do in spreadsheet software are supported by pandas. Continue reading the next tutorials to get started!

[Using the following link, you can access A more extended explanation to DataFrame and Series from the page on introduction to data structures.](https://pandas.pydata.org/docs/user_guide/dsintro.html#dsintro)

## NB: The DataFrame class extends the DataTable virtual class and supports the storage of any type of object (with length and [ methods) as columns.