# What kind of data does pandas handle?

I want to start using pandas

In [37]:
import pandas as pd

To load the pandas package and start working with it, import the package. 

The community agreed alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the pandas documentation.

## pandas data table representation

In [38]:
df = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)


A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. 

It is similar to a spreadsheet, a SQL table or the data.frame in R.

The table has 3 columns, each of them with a column label. The column labels are respectively Name, Age and Sex.

The column Name consists of textual data with each value a string, the column Age are numbers and the column Sex is textual data.

In spreadsheet software, the table representation of our data would look very similar:

In [39]:
df

Unnamed: 0,Name,Age,Sex
0,"Braund, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnell, Miss. Elizabeth",58,female


To manually store data in a table, create a DataFrame. 

When using a Python dictionary of lists, the dictionary keys will be used as column headers and the values in each list as columns of the DataFrame.

In [40]:
df["Age"]

0    22
1    35
2    58
Name: Age, dtype: int64

Each column in a DataFrame is a Series

you can create a Series from scratch as well:

In [41]:
ages=pd.Series([22,23,24,25], name="Age")

In [42]:
ages

0    22
1    23
2    24
3    25
Name: Age, dtype: int64

A pandas Series has no column labels, as it is just a single column of a DataFrame. A Series does have row labels.

# Do something with a DataFrame or Series

I want to know the maximum Age of the passengers

We can do this on the DataFrame by selecting the Age column and applying max():



In [43]:
df["Age"].max()

58

In [44]:
ages.max()

25

As illustrated by the max() method, you can do things with a DataFrame or Series. 

pandas provides a lot of functionalities, each of them a method you can apply to a DataFrame or Series. 

As methods are functions, do not forget to use parentheses ().

In [45]:
df.describe()

Unnamed: 0,Age
count,3.0
mean,38.333333
std,18.230012
min,22.0
25%,28.5
50%,35.0
75%,46.5
max,58.0


The describe() method provides a quick overview of the numerical data in a DataFrame. 

As the Name and Sex columns are textual data, these are by default not taken into account by the describe() method.

Many pandas operations return a DataFrame or a Series. The describe() method is an example of a pandas operation returning a pandas Series or a pandas DataFrame.

# How do I read and write tabular data?

In [46]:
df=pd.read_csv("G:\data_science\data\iris.csv")

pandas provides the read_csv() function to read data stored as a csv file into a pandas DataFrame. pandas supports many different file formats or data sources out of the box (csv, excel, sql, json, parquet, …), each of them with the prefix read_*.

In [47]:
df.head()

Unnamed: 0,sepal.length,sepal.width,petal.length,petal.width,variety
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


use the head() method with the required number of rows (in this case default 5 rows are shown) as argument. 

To see the first N rows of a DataFrame


In [48]:
df.tail()

Unnamed: 0,sepal.length,sepal.width,petal.length,petal.width,variety
145,6.7,3.0,5.2,2.3,Virginica
146,6.3,2.5,5.0,1.9,Virginica
147,6.5,3.0,5.2,2.0,Virginica
148,6.2,3.4,5.4,2.3,Virginica
149,5.9,3.0,5.1,1.8,Virginica


use the Tails() method with the required number of rows (in this case default 5 rows are shown) as argument. 

To see the Last N rows of a DataFrame


In [49]:
df.dtypes

sepal.length    float64
sepal.width     float64
petal.length    float64
petal.width     float64
variety          object
dtype: object

When asking for the dtypes, no brackets are used! dtypes is an attribute of a DataFrame and Series. 

Attributes of DataFrame or Series do not need brackets. 

Attributes represent a characteristic of a DataFrame/Series, whereas a method (which requires brackets) do something with the DataFrame/Series

In [50]:
df.to_excel("iris.xlsx", sheet_name="data", index=False)