**Pandas** stands for “Python Data Analysis” Library, is the most popular Python library for data analysis.

Import **pandas** using alias **pd**

In [1]:
import pandas as pd 


### Working with data that already exists

Load data stored in a csv file into a DataFrame using **pd.read_scv()** method

In [2]:
df = pd.read_csv('/kaggle/input/canada-per-capita-income/Canada_per_capita_income.csv') 

**Show the DataFrame shape**, such as number of rows and columns

In [3]:
print("Shape of the DataFrame: ",df.shape)

Shape of the DataFrame:  (47, 2)


Pandas only return the **first 5 rows** and **last 5 rows** and **headers** of a DataFrame, if the total rows of a DataFrame is less the the maximal rows allowed by system.

If the system allowed maximal rows is larger than the number of rows of a DataFrame, all rows can be displayed.

**Show the maximal system allowed rows in a DataFrame**

In [4]:
print(pd.options.display.max_rows)

60


**Change the system allowed maximal rows**

In [5]:
pd.options.display.max_rows = 30

In [6]:
print("Display Content of the DataFrame:\n ", df)

Display Content of the DataFrame:
      year        income
0   1970   3399.299037
1   1971   3768.297935
2   1972   4251.175484
3   1973   4804.463248
4   1974   5576.514583
..   ...           ...
42  2012  42665.255970
43  2013  42676.468370
44  2014  41039.893600
45  2015  35175.188980
46  2016  34229.193630

[47 rows x 2 columns]


**Show the first 5 rows of a DataFrame using head() method**

In [7]:
df.head()

Unnamed: 0,year,income
0,1970,3399.299037
1,1971,3768.297935
2,1972,4251.175484
3,1973,4804.463248
4,1974,5576.514583


**Show the information of a DataFrame**

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47 entries, 0 to 46
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   year    47 non-null     int64  
 1   income  47 non-null     float64
dtypes: float64(1), int64(1)
memory usage: 880.0 bytes


**Generate descriptive statstics of a DataFrame**

In [9]:
df.describe()

Unnamed: 0,year,income
count,47.0,47.0
mean,1993.0,18920.137063
std,13.711309,12034.679438
min,1970.0,3399.299037
25%,1981.5,9526.914515
50%,1993.0,16426.72548
75%,2004.5,27458.60142
max,2016.0,42676.46837


---

### Creating new DataFrame
A DataFrame can be created by **df.DataFrame()** method.

The content of a DataFrame is constructed using Python **dictionary-list**, where the **key** in the dictionary is the **column name** of the DataFrame, and the **values** are **a list of entries**.

In [10]:
pd.DataFrame({"year":[2020,2021],
              "income":[36000,38000]})

Unnamed: 0,year,income
0,2020,36000
1,2021,38000


By default, the row index of a new DataFrame is acsending from 0(0,1,2,3,...).
It's also possible to assign the desired row index when creating a DataFrame unsing `index`.

In [11]:
pd.DataFrame({"year":[2020,2021],
              "income":[36000,38000]},
               index = [1,2])

Unnamed: 0,year,income
1,2020,36000
2,2021,38000


---

### Indexing and selecting data

Access column of a DataFrame using `.` or `[]` operator.

In [12]:
df.year

0     1970
1     1971
2     1972
3     1973
4     1974
      ... 
42    2012
43    2013
44    2014
45    2015
46    2016
Name: year, Length: 47, dtype: int64

In [13]:
df['income']

0      3399.299037
1      3768.297935
2      4251.175484
3      4804.463248
4      5576.514583
          ...     
42    42665.255970
43    42676.468370
44    41039.893600
45    35175.188980
46    34229.193630
Name: income, Length: 47, dtype: float64

the indexing operator `[]` does have the advantage that it can handle column names with reserved characters in them (e.g. if we had a income in year column, df.income in year wouldn't work).

Access single value of a column in DataFrame

In [14]:
df['year'][0]

1970

In [15]:
df['income'][1]

3768.297935