# Pandas and Numpy Fundamentals

## Introduction to pandas
### Summary of All Different Label Selection
|         Select by Label         	|      Explicit Syntax      	| Shorthand Convention 	|
|:-------------------------------:	|:-------------------------:	|:--------------------:	|
| Single column from dataframe    	| df.loc[:,"col1"]          	| df["col1"]           	|
| List of columns from dataframe  	| df.loc[:,["col1","col7"]] 	| df[["col1","col7"]]  	|
| Slice of columns from dataframe 	| df.loc[:,"col1":"col4"]   	|                      	|
| Single row from dataframe       	| df.loc["row4"]            	|                      	|
| List of rows from dataframe     	| df.loc[["row1", "row8"]]  	|                      	|
| Slice of rows from dataframe    	| df.loc["row3":"row5"]     	| df["row3":"row5"]    	|
| Single item from series         	| s.loc["item8"]            	| s["item8"]           	|
| List of items from series       	| s.loc[["item1","item7"]]  	| s[["item1","item7"]] 	|
| Slice of items from series      	| s.loc["item2":"item4"]    	| s["item2":"item4"]   	|

In this mission, we learned:

1. How pandas and NumPy combine to make working with data easier.
    - About the two core pandas types: series and dataframes.
    - How to select data from pandas objects using axis labels.
2. In the next mission, we'll continue to learn about exploring data in pandas, including:
    - How to select data from pandas objects using boolean arrays.
    - How to assign data using labels and boolean arrays.
    - How to create new rows and columns in pandas.
    - New methods to make data analysis easier in pandas.

### Introduction to the Data
1. Use Python's type() function to assign the type of f500 to f500_type.
2. Use the DataFrame.shape attribute to assign the shape of f500 to f500_shape.
3. After you have run your code, use the variable inspector to look at the variables f500, f500_type, and f500_shape.

In [1]:
import pandas as pd
f500 = pd.read_csv('f500.csv',index_col=0)
f500.index.name = None
f500_type = type(f500)
f500_shape = f500.shape

### Introducing DataFrames

Just like in the previous missions, the f500 variable we created on the previous screen is available to you here.

1. Use the head() method to select the first 6 rows. Assign the result to f500_head.
2. Use the tail() method to select the last 8 rows. Assign the result to f500_tail.
3. After you have run your code, use the variable inspector and output to view information about the dataframe.

In [2]:
f500_head = f500.head(6)
f500_tail = f500.tail(8)

### Introducing DataFrames Continued
1. Use the DataFrame.info() method to display information about the f500 dataframe.
2. After you have run your code, use the variable inspector and output to view information about the dataframe.

In [3]:
f500.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   rank                      500 non-null    int64  
 1   revenues                  500 non-null    int64  
 2   revenue_change            498 non-null    float64
 3   profits                   499 non-null    float64
 4   assets                    500 non-null    int64  
 5   profit_change             436 non-null    float64
 6   ceo                       500 non-null    object 
 7   industry                  500 non-null    object 
 8   sector                    500 non-null    object 
 9   previous_rank             500 non-null    int64  
 10  country                   500 non-null    object 
 11  hq_location               500 non-null    object 
 12  website                   500 non-null    object 
 13  years_on_global_500_list  500 non-null    int64  
 14  em

### Selecting a Column From a DataFrame by Label
1. Select the industry column. Assign the result to the variable name industries.
2. Use Python's type() function to assign the type of industries to industries_type.
3. After you have run your code, use the variable inspector to look at the variables.

In [16]:
industries = f500["industry"]
industries_type = type(industries)

### Selecting Columns From a DataFrame by Label Continued
1. Select the country column. Assign the result to the variable name countries.
2. In order, select the revenues and years_on_global_500_list columns. Assign the result to the variable name revenues_years.
3. In order, select all columns from ceo up to and including sector. Assign the result to the variable name ceo_to_sector.
4. After you have run your code, use the variable inspector to view the variables.

In [5]:
countries = f500['country']
revenue_years = f500[["revenues", "years_on_global_500_list"]]

countries = f500['country']
revenue_years = f500[["revenues", "years_on_global_500_list"]]

In [6]:
industries = f500["industry"]
industries_type = type(industries)

### Selecting Columns From a DataFrame by Label Continued
1. Select the country column. Assign the result to the variable name countries.
2. In order, select the revenues and years_on_global_500_list columns. Assign the result to the variable name revenues_years.
3. In order, select all columns from ceo up to and including sector. Assign the result to the variable name ceo_to_sector.
4. After you have run your code, use the variable inspector to view the variables.

In [9]:
countries = f500['country']
revenues_years = f500[["revenues", "years_on_global_500_list"]]
ceo_to_sector = f500.loc[:, "ceo":"sector"]

### Selecting Rows From a DataFrame by Label

- By selecting data from f500:
    1. Create a new variable toyota, with:
        - Just the row with index Toyota Motor.
        - All columns.
    2. Create a new variable, drink_companies, with:
        - Rows with indicies Anheuser-Busch InBev, Coca-Cola, and Heineken Holding, in that order.
        - All columns.
    3. Create a new variable, middle_companies with:
        - All rows with indicies from Tata Motorsto Nationwide, inclusive.
        - All columns from rank to country, inclusive.

In [32]:
toyota = f500.loc["Toyota Motor"]
drink_companies = f500.loc[["Anheuser-Busch InBev", "Coca-Cola", "Heineken Holding"]]
middle_companies = f500.loc["Tata Motors":"Nationwide", "rank":"country"]

### Value Counts Method
We've already saved a selection of data from f500 to a dataframe named f500_sel.

1. Find the counts of each unique value in the country column in the f500_sel dataframe.
    - Select the country column in the f500_sel dataframe. Assign it to a variable named countries.
    - Use the Series.value_counts() method to return the value counts for countries. Assign the results to country_counts.

In [39]:
f500_sel = f500.head(6)
countries = f500_sel["country"]
country_counts = countries.value_counts()

### Selecting Items from a Series by Label
- From the pandas series countries_counts:
    1. Select the item at index label India. Assign the result to the variable name india.
    2. In order, select the items with index labels USA, Canada, and Mexico. Assign the result to the variable name north_america.

In [33]:
countries = f500['country']
countries_counts = countries.value_counts()
india = countries_counts["India"]
north_america = countries_counts[["USA", "Canada", "Mexico"]]

### Summary Challenge
By selecting data from f500:

1. Create a new variable big_movers, with:
    - Rows with indices Aviva, HP, JD.com, and BHP Billiton, in that order.
    - The rank and previous_rank columns, in that order.
2. Create a new variable, bottom_companies with:
    - All rows with indices from National Gridto AutoNation, inclusive.
    - The rank, sector, and country columns.

In [None]:
big_movers = f500.loc[["Aviva", "HP", "JD.com", "BHP Billiton"], ["rank", "previous_rank"]]
bottom_companies = f500.loc["National Grid":"AutoNation", ["rank", "sector", "country"]]

In [43]:
big_movers = f500.loc[["Aviva", "HP", "JD.com", "BHP Billiton"], ["rank", "previous_rank"]]
bottom_companies = f500.loc["National Grid":"AutoNation", ["rank", "sector", "country"]]