# <center> Getting Started with Pandas </center>

- [Import Pandas Library](#section_1)
- [Check Library Version](#section_2)
- [Anatomy of Pandas Data Structures](#section_3)
- [Understand Pandas Data Types](#section_4)
- [Pandas Syntax Structure](#section_5)

<hr>

### How to Import Pandas Library <a class="anchor" id="section_1"></a>

The easiest way to start using Pandas library is to get the Python Anaconda distribution, a cross-platform distribution for data analysis and scientific computing. The distribution has more than 250 of the most commonly used data science packages and tools such as Pandas, Scikit Learn, Jupyter, and so on. 


To start using Pandas in your analysis environment, you need to run the `import` command.

In [1]:
# Import Pandas library
import pandas as pd

### Check Current Version of Pandas library
To check your current version of Pandas library, you can run the `version` command.

In [2]:
# Check Pandas version
pd.__version__

'1.2.5'

### Anatomy of Pandas Data Structures
The three main Pandas data structure objects are:


- **DataFrame** object is a two-dimensional labeled structure that can hold data in rows and columns. 
- **Series** object is a one-dimensional labeled structure with a descriptive name and unique data type
- **Index** object used for storing axis labels for all Pandas objects.


In the example below, we see a Pandas DataFrame object about countries. The DataFrame consists of five different Series objects ("country_code", "country_name", "capital_city", "population_size", "national_day") with index values from 0 to 4. Later on, you will learn how you can use the DataFrame index to select specific data values.

### Pandas Data Types

In our countries DataFrame example, notice we have five different columns or series objects. Some of them have text data, some numerical data, and some datetime data type. 

The pandas library can support seven different data types, and it's really important for data professionals to understand the usability of each data type. 

In this table, we highlight these seven data types and the main use case for each one. 

<br>

<img src="data-type.jpg" class="center"/>

### Pandas Functions, Methods, and Attributes

In this part, we explain some syntax and vocabulary concepts in Pandas library. Mainly, we clarify the concepts of **Pandas Functions, Methods** and **Attributes**.

**Pandas Functions**

The Pandas library has a list of built-in functions that are designed to do a variety of operations such as reading datasets and merging data objects. 

These functions usually have the prefix pandas or pd.

In the example below, notice we used the built-in function [`read_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) to read a csv dataset file into a DataFrame object and assign it to a variable called `df_countries`.

In [5]:
# Import Sample DataFrame
df_countries = pd.read_csv("sample dataset.csv")

In [6]:
# Display the dataset
df_countries

Unnamed: 0,country_code,country_name,capital_city,population_size,national_day
0,CN,China,Beijing,1440297825,10/1/1949
1,AR,Argentina,Buenos Aires,45267449,7/9/1816
2,IN,India,New Delhi,1382345085,15/7/1947
3,NG,Nigeria,Abuja,206984347,1/10/1960
4,US,United States,Washington,331341050,4/7/1776


**Pandas Methods**

Pandas Methods are “**Action Oriented**” and mostly used to do something to the data object. 

In this example below, we use the [`set_index()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html) method to assign the country code column to be the DataFrame index. 
Notice we needed to pass the column name as a parameter to the method.

In [None]:
# Assign the "country code" as DataFrame index
df_countries.set_index("country_code")

**Pandas Attributes**

Attributes are used to describe and find information about a given data object.

Notice how attributes don’t need to have parentheses since they don’t usually take any parameters.

In the examples below, notice how we used the attribute [`shape`](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.shape.html) to find out the dimensions of our DataFrame object. The dimension basically tells us how many rows and records there are in our dataframe.

In [8]:
# Display the dimensions of countries DataFrame
df_countries.shape

(5, 5)