# **INTRODUCTION**

- We have previously looked at how and [why the NumPy library is so useful](https://github.com/Tess-hacker/THE-ULTIMATE-GUIDE-TO-UNDERSTANDING-NumPy-AND-PANDAS/blob/master/BOOLEAN%20INDEXING%20USING%20NUMPY.ipynb) when writing our codes to ensure efficiency and effectiveness of operations. The vectorization property of NumPy helps us to acheive the aforementioned. 


- However, the NumPy library contains certain shortcomings despite the existing fundamental structure and its usefulness. This is highlighted as follows:

    - The lack of support for column names forces us to frame questions as multi-dimensional array operations.
    
    - Support for only one data type per ndarray makes it more difficult to work with data that contains both numeric and string data.
    
    - There are lots of low level methods, but there are many common analysis patterns that don't have pre-built methods.


- This is exactly where the pandas library comes in handy for us. For all the shortcomings of the NumPy library which have been highlighted above, the Pandas library provides a solution. 


- You should **Note that** the **pandas library is not a replacement for NumPy, rather, it is an extension of the NumPy library.** The implication of this is that, as you are learning how to use Pandas, the knowledge of NumPy would come to play. Therefore, if you don't have the knowledge of how to use NumPy, you should go back to [the first lesson](https://github.com/Tess-hacker/THE-ULTIMATE-GUIDE-TO-UNDERSTANDING-NumPy-AND-PANDAS/blob/master/INTRODUCTION%20TO%20NUMPY.ipynb) and the [second part](https://github.com/Tess-hacker/THE-ULTIMATE-GUIDE-TO-UNDERSTANDING-NumPy-AND-PANDAS/blob/master/BOOLEAN%20INDEXING%20USING%20NUMPY.ipynb) on NumPy to obtain a foundational knowledge.


- The basic data structure for pandas library is called a **Dataframe**. To shed more light on this, remember that data in NumPy is structured in dimensional arrays; so, in the same vein, data in pandas is structured in dataframes. The difference between a Pandas dataframe and a NumPy array is shown below:

    - The axis values of dataframe can have both string and numeric labels while NumPy axis values can only have numeric labels. 
    
    - Dataframes can contain columns with multiple data types such as integer, floats and a string in the same dataset while a NumPy array is limited to just a single data type at a time.
    
    
- **LET'S GO LEARN PANDAS!!!** It'll be fun, I promise!

## **INTRODUCTION TO THE DATASET**

- To aid our learning of Pandas, we will be using the [**Fortune magazine's 2017 Global 500 list**](https://raw.githubusercontent.com/Tess-hacker/THE-ULTIMATE-GUIDE-TO-UNDERSTANDING-NumPy-AND-PANDAS/master/FORTUNE's%20500%20LIST.csv).


- The complete information of the columns on the dataset is shown below:

    - `company`: Name of the company.
    
    - `rank`: Global 500 rank for the company.
    
    - `revenues`: Company's total revenue for the fiscal year, in millions of dollars (USD).
    
    - `revenue_change`: Percentage change in revenue between the current and prior fiscal year.
    
    - `profits`: Net income for the fiscal year, in millions of dollars (USD).
    
    - `ceo`: Company's Chief Executive Officer.
    
    - `industry`: Industry in which the company operates.
    
    - `sector`: Sector in which the company operates.
    
    - `previous_rank`: Global 500 rank for the company for the prior year.
    
    - `country`: Country in which the company is headquartered.
    
    
- When importing the Pandas module, we can use the **alias technique** which we learnt [here](https://github.com/Tess-hacker/WORKING-WITH-COMPLEX-DATE-TIME-DATASET-IN-PYTHON/blob/master/DATES%20AND%20TIMES%20IN%20PYTHON.ipynb). Thus, we import pandas conventionally as follows:

    ` import pandas as pd`
    
    
- Afterwards, we would import and read the dataset using the `pandas.read_csv()` function. The `read_csv()` function would be reviewed in subsequent lessons but for now, note that it is **used to read and parse a CSV file into a pandas dataframe**. 


- Just like NumPy, Pandas has a `.shape()` function which returns a tuple representing the dimensions of each axis of the object. We'll use that and Python's `type()` function to inspect the Fortune 500 dataframe. 


- Let's get started!


In [1]:
import pandas as pd
fortune_500 = pd.read_csv('FORTUNE 500 LIST.csv', index_col= 0) # reading the dataset and assigning it to a variable
fortune_500.index.name = None # here we state that our pandas dataframe does not have an index name, so, we leave it to Pandas to assign numerical index labels to the dataframe
fortune_500type = type(fortune_500) # to find the type of the dataset we are about to use
fortune_500shape = fortune_500.shape # to ascertain the shape or way in which our dataset is arranged
print ("The first 10 rows of the Fortune 500 list are:")
print (fortune_500[:10])
print ('\n')
print ("The type name of the Fortune 500 list is:")
print (fortune_500type)
print ('\n')
print ("The Fortune 500 list shape is:")
print (fortune_500shape)

The first 10 rows of the Fortune 500 list are:
                          rank  revenues  revenue_change  profits  assets  \
Walmart                      1    485873             0.8  13643.0  198825   
State Grid                   2    315199            -4.4   9571.3  489838   
Sinopec Group                3    267518            -9.1   1257.9  310726   
China National Petroleum     4    262573           -12.3   1867.5  585619   
Toyota Motor                 5    254694             7.7  16899.3  437575   
Volkswagen                   6    240264             1.5   5937.3  432116   
Royal Dutch Shell            7    240033           -11.8   4575.0  411275   
Berkshire Hathaway           8    223604             6.1  24074.0  620854   
Apple                        9    215639            -7.7  45687.0  321686   
Exxon Mobil                 10    205004           -16.7   7840.0  330314   

                          profit_change                  ceo  \
Walmart                            -7.2  

## **INTRODUCTION TO DATAFRAMES**

- Our analysis above shows that our dataframe has 500 rows and 16 columns. Like you must have noticed, you will see that an advantage that pandas has over NumPy arrays is that dataframes are arranged in rows and columns making our dataset easier to read. 


- Recall that I stated that the axis labels of a dataframe could either  be numeric or strings. Let's confirm this by using the pandas `DataFrame.head()` function. This prints out the first row of the dataframe of the Fortune 500 list. The function **automatically prints the first five rows of the dataset** but a good tip is that the function can take in an optional argument that allows to specify the number of rows that we want to be printed. Let's examine this:

In [2]:
default_head = fortune_500.head()
specified_head = fortune_500.head(6)
print ("The default results without assigning a specified number of head are:")
print (default_head)
print('\n')
print ("The specified number of head results are:")
print (specified_head)

The default results without assigning a specified number of head are:
                          rank  revenues  revenue_change  profits  assets  \
Walmart                      1    485873             0.8  13643.0  198825   
State Grid                   2    315199            -4.4   9571.3  489838   
Sinopec Group                3    267518            -9.1   1257.9  310726   
China National Petroleum     4    262573           -12.3   1867.5  585619   
Toyota Motor                 5    254694             7.7  16899.3  437575   

                          profit_change                  ceo  \
Walmart                            -7.2  C. Douglas McMillon   
State Grid                         -6.2              Kou Wei   
Sinopec Group                     -65.0            Wang Yupu   
China National Petroleum          -73.7        Zhang Jianhua   
Toyota Motor                      -12.3          Akio Toyoda   

                                          industry                  sector  \
Walm

- In the same vein, we can use the `DataFrame.tail()` function in pandas to print out the last 5 rows of the dataframe. Isn't that just great?!!! Or we could specify the number of last rows that we want just like we did for the `.head()` function.


- Ready to experiment?!

In [3]:
default_tail = fortune_500.tail()
specified_tail = fortune_500.tail(6)
print ("The default results without assigning a specified number of tail are:")
print (default_tail)
print('\n')
print ("The specified number of tail results are:")
print (specified_tail)

The default results without assigning a specified number of tail are:
                                rank  revenues  revenue_change  profits  \
Teva Pharmaceutical Industries   496     21903            11.5    329.0   
New China Life Insurance         497     21796           -13.3    743.9   
Wm. Morrison Supermarkets        498     21741           -11.3    406.4   
TUI                              499     21655            -5.5   1151.7   
AutoNation                       500     21609             3.6    430.5   

                                assets  profit_change                 ceo  \
Teva Pharmaceutical Industries   92890          -79.3   Yitzhak Peterburg   
New China Life Insurance        100609          -45.6            Wan Feng   
Wm. Morrison Supermarkets        11630           20.4      David T. Potts   
TUI                              16247          195.5   Friedrich Joussen   
AutoNation                       10060           -2.7  Michael J. Jackson   

                

- Moving on to the second advantage of dataframe over np arrays, I stated that **Dataframes can contain columns with multiple data types: including integer, float, and string.**


- We can confirm it by using the pandas `DataFrames.dtypes` attribute to confirm the data types which are contained in each column of the dataset. This attribute is contained in the NumPy module as we can apply it also to NumPy arrays.

- Also, we could get an overview of our dataset including its shape and other information using the `DataFrame.info()` method.

- Let's confirm the data types and get an overview of the Fortune 500 list below:

In [4]:
fortune_500_dtype = fortune_500.dtypes
print ("The data types contained within the dataset are:")
print (fortune_500_dtype)
print ('\n')
fortune_500.info() 
print ('\n')
fortune_500_overview = fortune_500.info
print ("The overview of the dataset is shown below:")
print (fortune_500_overview)

The data types contained within the dataset are:
rank                          int64
revenues                      int64
revenue_change              float64
profits                     float64
assets                        int64
profit_change               float64
ceo                          object
industry                     object
sector                       object
previous_rank                 int64
country                      object
hq_location                  object
website                      object
years_on_global_500_list      int64
employees                     int64
total_stockholder_equity      int64
dtype: object


<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   rank                      500 non-null    int64  
 1   revenues                  500 non-null    int64  
 2   revenue_change       

- From our result above, you can find up to 3 different datatypes contained in the dataset. Also, the summary of the dataset is shown above with an addition of all the dataset elements without having to go through the stress manually.


- The first `.info()` function which I used above show us the number of entries in our index (representing the number of rows), a list of each column with their dtype and the number of non-null values, as well as a summary of the different dtypes and memory usage. The second `.info` without brackets shows us the individual elements and the total sum of each element i.e. the number of each elements contained within the dataset. 

## **SELECTING A COLUMN FROM A DATAFRAME** 

- When selecting a column from a NumPy array, we need to know the exact index number which it is located but when dealing with pandas, we already know that the dataframe can have several labels. This implies that elements can be selected from a dataframe using the labels using this attribute:

    `df.loc[row_label, column_label]`
    
    
- Let us experiment with this attribute below:


In [5]:
fortune_500_columnonly = fortune_500.loc[:, 'revenues']
fortune_500_rowonly = fortune_500.loc['State Grid', :]
fortune_500_rowcolumn = fortune_500.loc['State Grid', 'revenues']
print ("The Revenues column elements are:")
print (fortune_500_columnonly)
print ('\n')
print ("The State Grid row elements are:")
print (fortune_500_rowonly)
print ('\n')
print ("The revenues et State Grid row and column element is:")
print (fortune_500_rowcolumn)

The Revenues column elements are:
Walmart                           485873
State Grid                        315199
Sinopec Group                     267518
China National Petroleum          262573
Toyota Motor                      254694
                                   ...  
Teva Pharmaceutical Industries     21903
New China Life Insurance           21796
Wm. Morrison Supermarkets          21741
TUI                                21655
AutoNation                         21609
Name: revenues, Length: 500, dtype: int64


The State Grid row elements are:
rank                                             2
revenues                                    315199
revenue_change                                -4.4
profits                                     9571.3
assets                                      489838
profit_change                                 -6.2
ceo                                        Kou Wei
industry                                 Utilities
sector                        

- The above also implies that we can use a column or row label to select a whole row or a whole column. Let's examine this below:

In [6]:
industries = fortune_500['industry']
industries_type = type(industries)
print (industries)
print ('\n')
print (industries_type)

Walmart                                     General Merchandisers
State Grid                                              Utilities
Sinopec Group                                  Petroleum Refining
China National Petroleum                       Petroleum Refining
Toyota Motor                             Motor Vehicles and Parts
                                               ...               
Teva Pharmaceutical Industries                    Pharmaceuticals
New China Life Insurance          Insurance: Life, Health (stock)
Wm. Morrison Supermarkets                    Food and Drug Stores
TUI                                               Travel Services
AutoNation                                    Specialty Retailers
Name: industry, Length: 500, dtype: object


<class 'pandas.core.series.Series'>


## **INTRODUCTION TO SERIES**

- In the above codes, you should observe that when you select a single column within a dataframe, you end up with a data type known as a **Series**. In pandas, a single dimension of a dataframe is called a Series. Implicitly, **a dataframe is a combination of series.** So, going forward, you need to pay attention to which of the objects are dataframes and which are Series.

- Now, we will learn how to select multiple columns and rows within a dataframe using their labels.


In [9]:
# to select specific columns with all the rows selected, we use this approach:
selection_a = fortune_500.loc[:, ["country", "rank"]]
# OR
selection_b = fortune_500[["country", "rank"]]
print ("Method A using the 'loc' option gives us:")
print (selection_a)
print ('\n')
print ("Method B using the ordinary labels give us:")
print (selection_b)

Method A using the 'loc' option gives us:
                                country  rank
Walmart                             USA     1
State Grid                        China     2
Sinopec Group                     China     3
China National Petroleum          China     4
Toyota Motor                      Japan     5
...                                 ...   ...
Teva Pharmaceutical Industries   Israel   496
New China Life Insurance          China   497
Wm. Morrison Supermarkets       Britain   498
TUI                             Germany   499
AutoNation                          USA   500

[500 rows x 2 columns]


Method B using the ordinary labels give us:
                                country  rank
Walmart                             USA     1
State Grid                        China     2
Sinopec Group                     China     3
China National Petroleum          China     4
Toyota Motor                      Japan     5
...                                 ...   ...
Teva Pharmaceu

- The above results show us that **either we use the `loc` option or not, we can still select columns perfectly within a dataset.


- We can also **slice objects using their labels** which is shown below:

In [10]:
sliced_rows = fortune_500.loc[:, "rank":"profits"]
print (sliced_rows)

                                rank  revenues  revenue_change  profits
Walmart                            1    485873             0.8  13643.0
State Grid                         2    315199            -4.4   9571.3
Sinopec Group                      3    267518            -9.1   1257.9
China National Petroleum           4    262573           -12.3   1867.5
Toyota Motor                       5    254694             7.7  16899.3
...                              ...       ...             ...      ...
Teva Pharmaceutical Industries   496     21903            11.5    329.0
New China Life Insurance         497     21796           -13.3    743.9
Wm. Morrison Supermarkets        498     21741           -11.3    406.4
TUI                              499     21655            -5.5   1151.7
AutoNation                       500     21609             3.6    430.5

[500 rows x 4 columns]


- In summary, the data selection methods we have learnt and their alternative methods are as follows:

| **Select by Label** | **Explicit Syntax** | **Alternative Shorthand** |
| --- | --- | --- |
| Single column | `df.loc[:,"col1"]` | `df["col1"]` |
| List of columns | `df.loc[:,["col1", "col7"]]` | `df[["col1", "col7"]]` |
| Slice of columns | `df.loc[:,"col1":"col4"]` | |


- Let's experiment more with these methods below:

In [14]:
f500_country = fortune_500.loc[:, "country"]
f500_revenueyears = fortune_500.loc[:, ["revenues", "years_on_global_500_list"]]
ceo_to_sector = fortune_500.loc[:, "ceo":"previous_rank"] # to select all columns from 'ceo' to and including the 'sector' column. 
# Note that the normal rule of slicing where the last column selected is not included in the result does not apply in label selection of dataframes.
print ('F500 COUNTRY')
print (f500_country)
print('\n')
print ('F500 REVENUE YEARS')
print (f500_revenueyears)
print ('\n')
print ('CEO TO SECTOR COLUMN')
print (ceo_to_sector)

F500 COUNTRY
Walmart                               USA
State Grid                          China
Sinopec Group                       China
China National Petroleum            China
Toyota Motor                        Japan
                                   ...   
Teva Pharmaceutical Industries     Israel
New China Life Insurance            China
Wm. Morrison Supermarkets         Britain
TUI                               Germany
AutoNation                            USA
Name: country, Length: 500, dtype: object


F500 REVENUE YEARS
                                revenues  years_on_global_500_list
Walmart                           485873                        23
State Grid                        315199                        17
Sinopec Group                     267518                        19
China National Petroleum          262573                        17
Toyota Motor                      254694                        23
...                                  ...                    

## **SELECTING ROWS FROM A DATAFRAME USING A LABEL**

- We have learnt how to select columns from a pandas dataframe either as a dataframe (more than one column) or as a Series (one column). Now, we want to learn how to select the rows using the same method.


- Not to waste your time but just bear in mind that the methods of selecting either a Series or a DataFrame when dealing with columns **also apply to rows** except for the slicing rows shortcut which is reserved for rows only. Let me show you:

In [17]:
series_row = fortune_500.loc["Sinopec Group"]
print ("SERIES ROW:")
print (series_row)
print ('\n')
# to select a list of rows
list_row = fortune_500.loc[["Toyota Motor", "Walmart"]]
print ("LIST ROW:")
print (list_row)
print ('\n')
# to select a slice of rows. Note that this is the only exception of method that doesn't apply to columns
sliced_rows = fortune_500["State Grid": "Toyota Motor"]
print ("SLICED ROWS:")
print (sliced_rows)

SERIES ROW:
rank                                             3
revenues                                    267518
revenue_change                                -9.1
profits                                     1257.9
assets                                      310726
profit_change                                  -65
ceo                                      Wang Yupu
industry                        Petroleum Refining
sector                                      Energy
previous_rank                                    4
country                                      China
hq_location                         Beijing, China
website                     http://www.sinopec.com
years_on_global_500_list                        19
employees                                   713288
total_stockholder_equity                    106523
Name: Sinopec Group, dtype: object


LIST ROW:
              rank  revenues  revenue_change  profits  assets  profit_change  \
Toyota Motor     5    254694             7.7 

In [18]:
toyota = fortune_500.loc["Toyota Motor", :]
drink_companies = fortune_500.loc [["Anheuser-Busch InBev", "Coca-Cola", "Heineken Holding"], :]
middle_companies = fortune_500.loc["Tata Motors":"Nationwide","rank":"country"]
print ("TOYOTA:")
print (toyota)
print ('\n')
print ("DRINK COMPANIES:")
print (drink_companies)
print ('\n')
print ("MIDDLE COMPANIES:")
print (middle_companies)

TOYOTA:
rank                                                   5
revenues                                          254694
revenue_change                                       7.7
profits                                          16899.3
assets                                            437575
profit_change                                      -12.3
ceo                                          Akio Toyoda
industry                        Motor Vehicles and Parts
sector                            Motor Vehicles & Parts
previous_rank                                          8
country                                            Japan
hq_location                                Toyota, Japan
website                     http://www.toyota-global.com
years_on_global_500_list                              23
employees                                         364445
total_stockholder_equity                          157210
Name: Toyota Motor, dtype: object


DRINK COMPANIES:
                      rank 

## **VALUE COUNTS METHOD** 

- Because the Series are different from a dataframe, they also have special attributes and methods specific to them. Here, we would examine the `Series.value_counts()` method which is specific to Series. This method displays each unique non-null value in a column and their counts in order.


- The process of using this method is shown as follows:

    - We select a column from the dataframe first and assign to a variable:
    
    - Then, we substitute the 'Series' in the `Series.value_counts()` with the name of the variable we assigned the initially selected column to.

In [20]:
sectors = fortune_500["sector"]
print (type(sectors))
value_sectors = sectors.value_counts()
print ("Applying VALUE COUNTS to a SERIES:")
print (value_sectors)
print ('\n')
# selecting more than one column i.e. applying the method to a dataframe
sector_dataframe = fortune_500[["sector", "industry"]]
print (type(sector_dataframe))
value_sectordataframe = sector_dataframe.value_counts()
print ("Applying VALUE COUNTS to a DATAFRAME:")
print (value_sectordataframe)

<class 'pandas.core.series.Series'>
Applying VALUE COUNTS to a SERIES:
Financials                       118
Energy                            80
Technology                        44
Motor Vehicles & Parts            34
Wholesalers                       28
Health Care                       27
Food & Drug Stores                20
Transportation                    19
Telecommunications                18
Retailing                         17
Materials                         16
Food, Beverages & Tobacco         16
Industrials                       15
Aerospace & Defense               14
Engineering & Construction        13
Chemicals                          7
Media                              3
Business Services                  3
Household Products                 3
Hotels, Restaurants & Leisure      3
Apparel                            2
Name: sector, dtype: int64


<class 'pandas.core.frame.DataFrame'>


AttributeError: 'DataFrame' object has no attribute 'value_counts'

- As we can see from above, applying the `value_counts()` attribute to a dataframe runs an error because it is not a property that belongs to a dataframe.

In [21]:
countries = fortune_500['country']
print (type(countries))
countries_valuecount = countries.value_counts()
print (countries_valuecount)

<class 'pandas.core.series.Series'>
USA             132
China           109
Japan            51
Germany          29
France           29
Britain          24
South Korea      15
Netherlands      14
Switzerland      14
Canada           11
Spain             9
India             7
Australia         7
Italy             7
Brazil            7
Taiwan            6
Russia            4
Ireland           4
Singapore         3
Sweden            3
Mexico            2
Turkey            1
Israel            1
Venezuela         1
Finland           1
Malaysia          1
Indonesia         1
Norway            1
Thailand          1
Luxembourg        1
Denmark           1
Belgium           1
U.A.E             1
Saudi Arabia      1
Name: country, dtype: int64


- The `value_counts()` attribute gave us the count of unique values of all the countries. But what happens when we want to derive the value count for just a specific country? How do we go about this?


- Yippee! There's a solution to this, one which you are familiar with infact: **USING THE `LOC` METHOD**!!! In addition, there exists alternative shortcuts through which we can execute these operations. Let me show you:

| **Select by Label** | **Explicit Syntax** | **Alternative Shorthand** |
| --- | --- | --- |
| Single item from series 	 | `s.loc["item8"]` | `s["item8"]` |
| List of items from series | `s.loc[["item1","item7"]]` | `s[["item1","item7"]]` |
| Slice of items from series | `s.loc["item2":"item4"]` | `s["item2":"item4"]` |


- Now, let us practice the `value_counts()` application on individual elements within the dataset.

In [22]:
specific_valuecounts = countries_valuecount.loc[["India","USA","Canada","Mexico"]]
print (specific_valuecounts)

India       7
USA       132
Canada     11
Mexico      2
Name: country, dtype: int64


## **CLOSING CHALLENGE**

- After all that we have learnt in this lesson, it is only fair that we examine the lessons and put them to practice. Here's a table summarizing all lessons & shortcuts:

| **Select by Label** | **Explicit Syntax** | **Alternative Shorthand** |
| --- | --- | --- |
| Single item from series 	 | `s.loc["item8"]` | `s["item8"]` |
| List of items from series | `s.loc[["item1","item7"]]` | `s[["item1","item7"]]` |
| Slice of items from series | `s.loc["item2":"item4"]` | `s["item2":"item4"]` |
| Single column | `df.loc[:,"col1"]` | `df["col1"]` |
| List of columns | `df.loc[:,["col1", "col7"]]` | `df[["col1", "col7"]]` |
| Slice of columns | `df.loc[:,"col1":"col4"]` | |
| Single row from dataframe | `df.loc["row4"]` |  |
| List of rows from dataframe | `df.loc[["row1", "row8"]]` |  |


- **LET'S PUT ALL OUR HARDWORK INTO PRACTICE!**

In [23]:
# we want to select the best top companies and the last bottom companies.
big_movers = fortune_500.loc[["Aviva", "HP", "JD.com", "BHP Billiton"], ["rank","previous_rank"]]
bottom_companies = fortune_500.loc["National Grid":"AutoNation",["rank", "sector", "country"]]
print ("THE BIG MOVERS:")
print (big_movers)
print ('\n')
print ("THE BOTTOM COMPANIES:")
print (bottom_companies)

THE BIG MOVERS:
              rank  previous_rank
Aviva           90            279
HP             194             48
JD.com         261            366
BHP Billiton   350            168


THE BOTTOM COMPANIES:
                                       rank              sector  country
National Grid                           491              Energy  Britain
Dollar General                          492           Retailing      USA
Telecom Italia                          493  Telecommunications    Italy
Xiamen ITG Holding Group                494         Wholesalers    China
Xinjiang Guanghui Industry Investment   495         Wholesalers    China
Teva Pharmaceutical Industries          496         Health Care   Israel
New China Life Insurance                497          Financials    China
Wm. Morrison Supermarkets               498  Food & Drug Stores  Britain
TUI                                     499   Business Services  Germany
AutoNation                              500           Retail

- In this lesson, we learned:

    - How pandas and NumPy combine to make working with data easier.
    
    - About the two core pandas types: series and dataframes.
    
    - How to select data from pandas objects using axis labels.

- In the next mission, we'll continue to learn about exploring data in pandas, including:

    - How to select data from pandas objects using boolean arrays.
    
    - How to assign data using labels and boolean arrays.
    
    - How to create new rows and columns in pandas.
    
    - New methods to make data analysis easier in pandas.


- Now, **GO AND BEGIN USING PANDAS LIKE A PRO!!**