# Essential Guide to Python Pandas
### A Crash Course with Reusable Code Template in Jupyter Notebook

The Pandas library has emerged as one of the most important data wrangling and processing tools for Python developers and data professionals. It allows users to quickly apply data wrangling tasks such as handling missing data, removing duplicate records, merging multiple datasets, and so on. Therefore, the Pandas library has become a must-have tool in the data science toolkit.

In this series of articles, we provide a crash course to get you started using the Pandas library. The course is designed to be a practical guide with real-life examples of the most common data manipulation tasks.

### Who is this Course for

This course is for aspiring data professionals and Python developers who want to learn how to process data in Pandas. We assume you already have a minimum working knowledge about Python programming language and are comfortable running data science documents using Jupyter notebook. 

To follow the examples in this course, you can copy and paste the code snippets into your Jupyter notebook environment.

### What Makes Pandas Special

Pandas is an open-source, free (under a [BSD license](https://en.wikipedia.org/wiki/BSD_licenses)) Python library originally written by [Wes McKinney](https://en.wikipedia.org/wiki/Wes_McKinney). It is a high-level data structure and manipulation tool designed to make data analysis and wrangling fast and easy. The library offers a variety of functions and methods to transfer different data sources into a tabular format. In this format, each record is housed on one row and each column contains a unique data type. 

Python Developers and Data Professionals can upload data from a variety of data sources such as SQL Databases, API JSON format, CSV files as well as native Python data structures like lists and dictionaries. This flexibility makes Pandas suitable for a wide range of applications. Among them are machine learning modeling, data visualization, and time series forecasting. The versatile structure also makes it easy to integrate pandas with other libraries such as scikit-learn. 

Early [releases of Pandas](https://pandas.pydata.org/docs/whatsnew/index.html) DataFrame dated back to 2011 with Pandas version 1.0 released in January 2020.

## Table of Contents

1. [ How to Import Pandas Library](#section_1)
2. [ Anatomy of Pandas Data Structures](#section_2)
3. [ Get Data into and from Pandas](#section_3)
    * Python Native Data Structures
    * Tabular Data Files
    * API Query and JSON Format
    * Web Pages Data
4. [Describe Information in DataFrames](#section_4)
5. [Understand Data Types](#section_5)
6. [Data Cleaning in Pandas](#section_6)
    * Split & Merge Columns
    * Change Columns DataType
    * Rename Columns
    * Drop Rows and Columns
    * Manipulate text content
7. [Pandas Merging & Joining Data](#)
8. [Data Accessing & Aggregation](#)
9. [Pandas Data Visualization](#)
10. [Pandas Analysis Project](#)
    * Collect Data From Multiple Sources
    * Clean Data
    * Join DataFrames
    * Perform Basic Analysis

### 1. How to Import Pandas Library <a class="anchor" id="section_1"></a>
The easiest way to start using Pandas library is to get the Python [Anaconda](https://www.anaconda.com/products/individual#Downloads) distribution, a cross-platform distribution for data analysis and scientific computing. The distribution has more than 250 of the most commonly used data science packages and tools such as Pandas, Scikit Learn, Jupyter, and so on. To start using Pandas in your analysis environment, you need to simply run the import Pandas command.

In [1]:
# Import Pandas library
import pandas as pd

To check your current version of Pandas library, you can run the __version__ command. 

In [2]:
# Check Pandas version
pd.__version__

'1.2.4'

Great, you are now ready to start learning Pandas library!

**[Back to Top](#title)**

### 2. Anatomy of Pandas Data Structures <a class="anchor" id="section_2"></a>


The two main Pandas data structure objects are **DataFrames** and **Series**. Pandas DataFrame object is a two-dimensional labeled structure that can hold data in rows and columns, similar to a spreadsheet file or relational database table. Each DataFrame column (also called Pandas Series Object) is a one-dimensional labeled structure with a descriptive name and unique data type that applies to all values in that column. ***In other words, you can think of a DataFrame as a collection of Series.***

Both DataFrame and Series objects have index keys that can be used to reference corresponding values. Index keys are created automatically and can be manipulated by the user to assign specific values as DataFrame or Series index. 

In the example below, we see a Pandas DataFrame object about countries. The DataFrame consists of four different Series objects (country_name, capital_city, population, area_km2) with index values representing each country’s ISO code. Later on, you will learn how you can use the DataFrame index to select specific data values. 

The image below demonstrates the structure of Pandas DataFrame and Series objects.

<img src='Images/DataFrame Stracture.png' class="center"/>

**[Back to Top](#title)**

### 3. Getting Data into and from Pandas <a class="anchor" id="section_3"></a>

Pandas library is designed to access data from a wide variety of sources and formats. Some popular data sources include tabular files, database tables, third-party APIs and even using Python native data structures. This flexibility is what makes Pandas library useful for many user groups such as developers and data professionals.

To upload data into a Pandas DataFrame, you can utilize a set of reader functions such as [pandas.read_csv()]() to get the data into DataFrame objects. The library also has a set of writer functions such as [pandas.DataFrame.to_csv()]() to allow users to export data frames into external dataset files. 

In each function, you can use a set of parameters to pass specific information about your dataset. For instance, in the [pandas.read_csv()]() and [pandas.read_table()]() functions, you can use the sep parameter to identify the delimiter that separates your data values. 

The figure below shows a list of available readers and writers functions. 

<img src='Images/Pandas_io_readwrite.svg' class="center"/>

Source: [Pandas IO Tools]()

In the following sections, we will learn about some most commonly used methods to get data into and from a Pandas DataFrame.

**[Back to Top](#title)**

#### 3.1 Python Native Data Structures
The Python programming language has a variety of built-in data structures such as lists, tuples, dictionaries, strings, and sets. These data structures are ideal for storing data during program execution, however, they can not be efficiently used to perform analytical tasks such as exploratory analysis and data visualization. Pandas library can transfer Python data structures into DataFrame objects to allow users to easily perform data manipulation and analytics. 

For example, imagine we have a Python dictionary to save country information such as below: 


In [3]:
{'country_name':'New Zealand',
 'capital_city':'Wellington',
 'country_code':'NZ',
 'population':4783063,
 'area_km2':270467}

{'country_name': 'New Zealand',
 'capital_city': 'Wellington',
 'country_code': 'NZ',
 'population': 4783063,
 'area_km2': 270467}

The dictionary key represents the attribute label or title while the dictionary value represents the corresponding information. Pandas library can convert a list of similar dictionaries into a DataFrame object as shown in the example below:

In [31]:
# Create a list of dictionaries
list_of_countries = [
{'country_name':'China','capital_city':'Beijing','population':1433783686,'area_km2':9596961},
{'country_name':'New Zealand','capital_city':'Wellington','population':4783063,'area_km2':270467},
{'country_name':'South Africa','capital_city':'Pretoria','population':58558270,'area_km2':1221037},
{'country_name':'United Kingdom','capital_city':'London','population':67530172,'area_km2':242495},
{'country_name':'United States','capital_city':'Washington DC','population':329064917,'area_km2':9525067}]

# Create a Pandas DataFrame from a list of dictionaries
countries = pd.DataFrame(list_of_countries, index = ['CN','NZ','ZA','GB','US'])

countries.head()

Unnamed: 0,country_name,capital_city,population,area_km2
CN,China,Beijing,1433783686,9596961
NZ,New Zealand,Wellington,4783063,270467
ZA,South Africa,Pretoria,58558270,1221037
GB,United Kingdom,London,67530172,242495
US,United States,Washington DC,329064917,9525067


In the code above, we notice the variable list_of_countries is defined as a Python list with each element representing a Python dictionary of country information. We import the Pandas library and then we use the built-in DataFrame function to transfer the list of countries into a Pandas Dataframe called countries. The pd.DataFrame is a built-in function to construct DataFrame objects from scratch or from native Python data structures. 

Notice how we used the index parameter to pass a list of country codes as our DataFrame index values. We can then examine the new DataFrame object using the built-in head() function to return the top rows. We will learn later about the different ways to examine any DataFrame content. 

We notice how the dictionary keys were assigned as the DataFrame column names while dictionary values are assigned as the cells. A numerical index with values between 0 to 4 was automatically assigned to the DataFrame object. The user can choose to pass specific index values by using the index parameter as shown in the example above.

Another approach is to use a Python dictionary where keys represent column names and values represent Python lists. You can then make use of Pandas [from_dict](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.from_dict.html) function to transfer the dictionary into a DataFrame object as shown in this example:

In [5]:
dictionary_of_countries = {'country_name': ['China', 'New Zealand', 'South Africa', 'United Kingdom', 'United States'],
                           'country_code': ['CN', 'NZ', 'ZA', 'GB', 'US'],
                           'capital_city': ['Beijing', 'Wellington', 'Pretoria', 'London', 'Washington DC'],
                           'population': [1433783686, 4783063, 58558270, 67530172, 329064917],
                           'area_km2': [9596961, 270467, 1221037, 242495, 9525067]}

countries = pd.DataFrame.from_dict(dictionary_of_countries)

countries.head()

Unnamed: 0,country_name,country_code,capital_city,population,area_km2
0,China,CN,Beijing,1433783686,9596961
1,New Zealand,NZ,Wellington,4783063,270467
2,South Africa,ZA,Pretoria,58558270,1221037
3,United Kingdom,GB,London,67530172,242495
4,United States,US,Washington DC,329064917,9525067


The above examples demonstrated the flexibility of transforming data stored in Python native data structures into Pandas DataFrame objects. 

**[Back to Top](#title)**

#### 3.2 Tabular Data Files
Tabular data is usually structured into rows and columns and presented in various file formats including CSV, tab-delimited files, fixed-width formats, and spreadsheets. Tabular files can be accessed from the local computer or online. 

In this section, we will learn about how to access data from CSV files, Excel Sheet files, and SQL tables. First, we access a CSV file for alcohol consumption by country accessed from the fivethirtyeight [GitHub]() Repository. To do that, we use the [read_csv()]() function and pass the file online location on GitHub. If the CSV file is stored on the local machine, we need to pass the file path. 


In [6]:
# Create a DataFrame object using read_csv() function
alcohol_data = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv')

alcohol_data.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
0,Afghanistan,0,0,0,0.0
1,Albania,89,132,54,4.9
2,Algeria,25,0,14,0.7
3,Andorra,245,138,312,12.4
4,Angola,217,57,45,5.9


Another commonly used tabular data format is spreadsheets. Pandas library provides the [read_excel()]() built-in function to access Microsoft Excel spreadsheet files as shown in the example below:

In [7]:
# Create a DataFrame object using read_excel() function
pd.read_excel("path_to_file/myFile.xls", sheet_name="Sheet1")

FileNotFoundError: [Errno 2] No such file or directory: 'path_to_file/myFile.xls'

Notice how the [read_excel()]() example makes use of the sheet_name parameter to tell the system which sheet name contains the needed dataset. For a complete list of all parameters for each built-in function, check the Pandas official documentation by clicking the function name. 

Another common scenario is to query relational database tables using SQL language. Obviously, you would need to provide the necessary credentials and metadata to establish a connection with the database server. You can then apply [pandas.read_sql()]() function to pass the SQL query and load the result into a Pandas DataFrame object. 

To simulate this scenario, the following code will create a local database using the Python SQLite engine. We will then use Pandas to access the data using SQL queries. 

In [8]:
# Import SQLite library
import sqlite3

# Assign the database name
db_path = r'local_db_example.db'

# Create the database file
conn = sqlite3.connect(db_path) 

# Establish a connection with the database file
c = conn.cursor() 

# Create a database table
c.execute("""CREATE TABLE mytable
         (id, name, position)""")


# Add some data
c.execute("""INSERT INTO mytable (id, name, position)
          values(1, 'James', 'Data Scientist')""")

c.execute("""INSERT INTO mytable (id, name, position)
          values(2, 'Mary', 'Software Developer')""")

c.execute("""INSERT INTO mytable (id, name, position)
          values(3, 'Max', 'Data Engineer')""")

# Commit changes and close the connection
conn.commit()

c.close()

The relational database name `local_db_example.db` should appear as an external file in the same location with your notebook. The database file already includes dummy data describing employee details. The following code queries the data into a Pandas DataFrame object. 

In [9]:
# Identify the database name
database = "local_db_example.db"

# Establish a connection with the database file
conn = sqlite3.connect(database)

# Use Pandas function to pass SQL query and create a DataFrame object
people = pd.read_sql("select * from mytable", con=conn)

# Print the generated DataFrame
print(people)

# Close the connection
conn.close()

   id   name            position
0   1  James      Data Scientist
1   2   Mary  Software Developer
2   3    Max       Data Engineer


In the above example, we created a local database file and used the Pandas library to query the data using SQL, and passed the results into a Pandas DataFrame object. In more practical examples, you may need to query data from relational databases that are stored on remote servers or in the cloud.

**[Back to Top](#title)**

#### 3.3 API Query and JSON Format
When working on daily tasks, data professionals often need to access data from third-party APIs. This approach is common when the data is continually updated like weather forecasting or when you need to select a small subset of data. API response data usually comes in JSON format which you can think of as a collection of Python data structures like dictionaries and lists represented as text. 

For example, we will use API data from [open-notify]() to get information about the International Space Station ISS. The API gives information about the space station location, altitude, and crow members. The following code will make a query about current crew members onboard ISS. In this example, we will make use of the Python request library to establish a connection with the API.


In [10]:
# Import requests library to handle API connection
import requests

# Import and initialize Data pretty printer library
import pprint
pp = pprint.PrettyPrinter(indent=4)

# Pass the API query using requests library
response = requests.get("http://api.open-notify.org/astros.json")
# print(response.status_code)

# Convert response data into JSON format
response_data = response.json()

Once we have the API response data, we notice the response includes a list of dictionaries about the astronauts currently aboard the ISS. We can convert this part of the response into a Pandas DataFrame object as shown in the example below:

In [11]:
# Examine the response data
pp.pprint(response_data)

# Create a DataFrame of astronauts currently aboard the ISS
astronauts = pd.DataFrame(response_data['people'])

astronauts

{   'message': 'success',
    'number': 10,
    'people': [   {'craft': 'ISS', 'name': 'Mark Vande Hei'},
                  {'craft': 'ISS', 'name': 'Oleg Novitskiy'},
                  {'craft': 'ISS', 'name': 'Pyotr Dubrov'},
                  {'craft': 'ISS', 'name': 'Thomas Pesquet'},
                  {'craft': 'ISS', 'name': 'Megan McArthur'},
                  {'craft': 'ISS', 'name': 'Shane Kimbrough'},
                  {'craft': 'ISS', 'name': 'Akihiko Hoshide'},
                  {'craft': 'Tiangong', 'name': 'Nie Haisheng'},
                  {'craft': 'Tiangong', 'name': 'Liu Boming'},
                  {'craft': 'Tiangong', 'name': 'Tang Hongbo'}]}


Unnamed: 0,name,craft
0,Mark Vande Hei,ISS
1,Oleg Novitskiy,ISS
2,Pyotr Dubrov,ISS
3,Thomas Pesquet,ISS
4,Megan McArthur,ISS
5,Shane Kimbrough,ISS
6,Akihiko Hoshide,ISS
7,Nie Haisheng,Tiangong
8,Liu Boming,Tiangong
9,Tang Hongbo,Tiangong


**[Back to Top](#title)**

#### 3.4 Web Pages Data
Pandas library offers a built-in function to allow users to parse HTML tables from web pages into a list of Pandas DataFrames. This functionality provides users with a fast way to access data tables embedded in web pages’ html code. To demonstrate the process, we will use the Pandas function [read_html()]() to parse the [list of countries by population table]() from Wikipedia into a DataFrame object as shown in the example below:

In [14]:
# 
web_data = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)')

type(web_data)

list

When examining the type of web_data variable, we notice the [read_html()]() function has returned a list of five elements representing the table tags detected in the webpage HTML code. Each table tag was automatically converted into a Pandas DataFrame object. 

However, not all tables are useful as they may contain unwanted HTML data. Therefore, we must carefully examine the returned list and identify the useful DataFrame objects. 

In this example, we notice the first item `web_data[0]` contains the needed countries table. Therefore, we can assign the fourth item to a new variable for easier use as shown in the code below:

In [24]:
web_countries_table = web_data[0]

web_countries_table.head()

Unnamed: 0,Country/Area,UN continentalregion[4],UN statisticalsubregion[4],Population(1 July 2018),Population(1 July 2019),Change
0,China[a],Asia,Eastern Asia,1427647786,1433783686,+0.43%
1,India,Asia,Southern Asia,1352642280,1366417754,+1.02%
2,United States,Americas,Northern America,327096265,329064917,+0.60%
3,Indonesia,Asia,South-eastern Asia,267670543,270625568,+1.10%
4,Pakistan,Asia,Southern Asia,212228286,216565318,+2.04%


By examining the first few rows of our DataFrame, we notice the parsed table includes some unwanted strings such as brackets `[ ]` and parentheses `( )` in country names and column titles. 

Also, we notice the column Change includes both real numbers and mathematical symbols which would transfer the column into a text datatype (we will learn more about Pandas data types soon). These issues are normal for data accessed from web pages and may not be readily available for data analysis. 

Luckily, Pandas library includes some other tools and functions that would help the user to clean up the data for analysis. The use of the [read_html()]() function would save time from web data using more traditional web scraping libraries such as Beautiful Soup. In later examples, we will learn more about how to convert similar tables from web pages into DataFrames ready for data analysis.

In addition to the above scenarios for creating DataFrame objects using reader functions, the Pandas library also provides a set of writer functions to save DataFrame objects as external datasets.

**[Back to Top](#title)**

### 4.0 Describe Information in DataFrames

In the previous section, we learned how to use Pandas reader and writer functions to create DataFrame objects from different data sources. Once the data is uploaded, users can make use of several built-in attributes designed to examine the states of the DataFrames. These attributes can provide users with information about the DataFrame size, data types, missing values, in addition to basic summary statistics. 

This information is important for users to identify what changes they would need to make to prepare the DataFrame object for further analysis. 

To demonstrate commands, we will use Pandas reader functions to import some publicly available datasets from GitHub.

In [25]:
# 
alcohol_data = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv')

alcohol_data.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
0,Afghanistan,0,0,0,0.0
1,Albania,89,132,54,4.9
2,Algeria,25,0,14,0.7
3,Andorra,245,138,312,12.4
4,Angola,217,57,45,5.9


One of the first questions to answer when working with a new dataset is to know the size of the data, i.e., how many rows and columns we have. 

To answer this question, we can use the shape attribute which returns a Python tuple representing the dimensionality of DataFrame objects. The first value represents the number of records while the second value counts the number of columns. 

In [None]:
# Check the dimension of alcohol_data DataFrame
Alcohol_data.shape

Another commonly used attribute is size, which can be used to identify how many elements we have in a given DataFrame or Series object (including missing values).

In [None]:
# How many elements in alcohol_data DataFrame
Alcohol_data.size

# How many elements in alcohol_data[`country`] Series
Alcohol_data[`country`].size

Once we have learned about the size of our DataFrame and Series objects, we can learn more details using the `info()` attribute. It provides a useful summary of the DataFrame columns as shown in the  example below:

In [26]:
alcohol_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 5 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   country                       193 non-null    object 
 1   beer_servings                 193 non-null    int64  
 2   spirit_servings               193 non-null    int64  
 3   wine_servings                 193 non-null    int64  
 4   total_litres_of_pure_alcohol  193 non-null    float64
dtypes: float64(1), int64(3), object(1)
memory usage: 7.7+ KB


The results first highlight the number of records in the DataFrame and the range of the numerical index value automatically assigned to this DataFrame. It shows the total number of columns (5 columns in our dataset). Next, it lists the column names with their respective data types and how many values of that column contain an empty or null value. 

In this dataset, it seems we don’t have any missing values since the number of records is equal to the number of non-null counts. We notice the data types for the country column is Pandas objects which represent text values, while three servings columns (`beer_servings`, `spirit_servings`, and `wine_servings`) have the int64 data type which represents integer numbers, and total litres column assigned float64 data type which allows real numbers.

At this stage, we have an idea about what changes we need to make in order to have the correct data types. For example, numerical data types such as int64 allow us to apply mathematical calculations on the values while object data type allows us to apply text formatting functions. In the next section about data cleaning, we will learn how to change data types. 

Finally, the function displays data about how many columns there are for each data type and the memory size of this DataFrame (the memory size info can be useful when working with a large DataFrame and you may wish to optimize the DataFrame size).

The other exploratory attribute [describe()]() will return basic statistical analysis of the DataFrame numeric columns as shown in this example below:

In [27]:
alcohol_data.describe()

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
count,193.0,193.0,193.0,193.0
mean,106.160622,80.994819,49.450777,4.717098
std,101.143103,88.284312,79.697598,3.773298
min,0.0,0.0,0.0,0.0
25%,20.0,4.0,1.0,1.3
50%,76.0,56.0,8.0,4.2
75%,188.0,128.0,59.0,7.2
max,376.0,438.0,370.0,14.4


From the example above, we notice the [describe()]() function was only applied to the numerical columns and the country name column was ignored. This is because descriptive statistics are based on numerical columns only to summarize the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values.

In addition to the numerical statistical summary, you can also explore the features of text values in DataFrames. For this exercise, we will use the [country codes dataset]() from the [Open Data GitHub repository](). The data include many details about each country's international codes and geographic regions.

In [28]:
#
countries_data = pd.read_csv('https://raw.githubusercontent.com/datasets/country-codes/master/data/country-codes.csv')

countries_data.head()

Unnamed: 0,FIFA,Dial,ISO3166-1-Alpha-3,MARC,is_independent,ISO3166-1-numeric,GAUL,FIPS,WMO,ISO3166-1-Alpha-2,...,Sub-region Name,official_name_ru,Global Name,Capital,Continent,TLD,Languages,Geoname ID,CLDR display name,EDGAR
0,TPE,886,TWN,ch,Yes,158.0,925,TW,,TW,...,,,,Taipei,AS,.tw,"zh-TW,zh,nan,hak",1668284.0,Taiwan,
1,AFG,93,AFG,af,Yes,4.0,1,AF,AF,AF,...,Southern Asia,Афганистан,World,Kabul,AS,.af,"fa-AF,ps,uz-AF,tk",1149361.0,Afghanistan,B2
2,ALB,355,ALB,aa,Yes,8.0,3,AL,AB,AL,...,Southern Europe,Албания,World,Tirana,EU,.al,"sq,el",783754.0,Albania,B3
3,ALG,213,DZA,ae,Yes,12.0,4,AG,AL,DZ,...,Northern Africa,Алжир,World,Algiers,AF,.dz,ar-DZ,2589581.0,Algeria,B4
4,ASA,1-684,ASM,as,Territory of US,16.0,5,AQ,,AS,...,Polynesia,Американское Самоа,World,Pago Pago,OC,.as,"en-AS,sm,to",5880801.0,American Samoa,B5


For instance, the column Region Name appears to be a text column that holds the geographical region of each country. In order to find the number of individual region values we can apply functions like [unique()](), and [value_counts()]() on Pandas series values like below:

In [29]:
countries_data['Region Name'].unique()

array([nan, 'Asia', 'Europe', 'Africa', 'Oceania', 'Americas'],
      dtype=object)

In [30]:
countries_data['Region Name'].value_counts()

Africa      60
Americas    57
Europe      52
Asia        50
Oceania     29
Name: Region Name, dtype: int64

**[Back to Top](#title)**

### 5.0 Understanding Data Types

It is important to assign the correct data type for each DataFrame column in order to avoid any problems for data analysis. Pandas will try to infer the correct data type for each column. For a list of Pandas data type mapping, please refer to this link. 

However, sometimes you need to change the data type manually. Selecting the correct data type will allow you to perform further analysis such as mathematical analysis on numeric columns and text formatting on object data types. The following table presents common Pandas data types:

| Pandas DataType | Usage |
| --- | --- |
| object | Text or mixed numeric and non-numeric values |




int64
Integer numbers
float64
Floating point numbers
bool
True/False values
datetime64
Date and time values
timedelta[ns]
Differences between two datetimes
category
Finite list of text values

