# Pandas and NumPy Fundamentals

## Exploring Data with pandas: Fundamentals
In this mission, we learned:

- How to select data from pandas objects using boolean arrays.
- How to assign data using labels and boolean arrays.
- How to create new rows and columns in pandas.
- Many new methods to make data analysis easier in pandas.
- In the next mission, we'll continue to learn about exploring data in pandas, including:

New ways of creating dataframe and series objects.
Advanced selection techniques.
Performing more complex analysis.

### Introduction to the Data
We've already read the data set into a pandas dataframe and assigned it to a variable named f500.

1. Use the DataFrame.head() method to select the first 10 rows in f500. Assign the result to f500_head.
2. Use the DataFrame.info() method to display information about the dataframe.

In [1]:
import pandas as pd
import numpy as np
f500 = pd.read_csv('../f500.csv',index_col=0)

In [2]:
f500_head = f500.head(10)
f500_head.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, Walmart to Exxon Mobil
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   rank                      10 non-null     int64  
 1   revenues                  10 non-null     int64  
 2   revenue_change            10 non-null     float64
 3   profits                   10 non-null     float64
 4   assets                    10 non-null     int64  
 5   profit_change             8 non-null      float64
 6   ceo                       10 non-null     object 
 7   industry                  10 non-null     object 
 8   sector                    10 non-null     object 
 9   previous_rank             10 non-null     int64  
 10  country                   10 non-null     object 
 11  hq_location               10 non-null     object 
 12  website                   10 non-null     object 
 13  years_on_global_500_list  10 non-null     int64  
 14  em

### Vectorized Operations
1. Subtract the values in the rank column from the values in the previous_rank column. Assign the result to rank_change.

In [3]:
rank_change = f500["previous_rank"] - f500["rank"]

### Series Data Exploration Methods
1. Use the Series.max() method to find the maximum value for the rank_change series. Assign the result to the variable rank_change_max.
2. Use the Series.min() method to find the minimum value for the rank_change series. Assign the result to the variable rank_change_min.
3. After running your code, use the variable inspector to view the new variable you created.

In [4]:
rank_change_max = rank_change.max()
rank_change_min = rank_change.min()

### Series Describe Method
1. Return a series of descriptive statistics for the rank column in f500.
    - Select the rank column. Assign it to a variable named rank.
    - Use the Series.describe() method to return a series of statistics for rank. Assign the result to rank_desc.
2. Return a series of descriptive statistics for the previous_rank column in f500.
    - Select the previous_rank column. Assign it to a variable named prev_rank.
    - Use the Series.describe() method to return a series of statistics for prev_rank. Assign the result to prev_rank_desc.
3. After you have run your code, use the variable inspector to view each of the new variables you created. Try to identify any potential issues with the data before moving onto the next screen.

In [5]:
rank = f500["rank"]
rank_desc = rank.describe()

prev_rank = f500["previous_rank"]
prev_rank_desc = prev_rank.describe()

### Method Chaining
1. Use Series.value_counts() and Series.loc to return the number of companies with a value of 0 in the previous_rank column in the f500 dataframe. Assign the results to zero_previous_rank.
2. After running your code, use the variable inspector to view each of the new variables you created.

In [6]:
zero_previous_rank = f500["previous_rank"].value_counts().loc[0]

### Series Data Exploration Methods
1. Use the Series.max() method to find the maximum value for the rank_change series. Assign the result to the variable rank_change_max.
2. Use the Series.min() method to find the minimum value for the rank_change series. Assign the result to the variable rank_change_min.
3. After running your code, use the variable inspector to view the new variable you created.

In [7]:
rank_change_max = rank_change.max()
rank_change_min = rank_change.min()

### Series Describe Method
1. Return a series of descriptive statistics for the rank column in f500.
    - Select the rank column. Assign it to a variable named rank.
    - Use the Series.describe() method to return a series of statistics for rank. Assign the result to rank_desc.
2. Return a series of descriptive statistics for the previous_rank column in f500.
    - Select the previous_rank column. Assign it to a variable named prev_rank.
    - Use the Series.describe() method to return a series of statistics for prev_rank. Assign the result to prev_rank_desc.
3. After you have run your code, use the variable inspector to view each of the new variables you created. Try to identify any potential issues with the data before moving onto the next screen.

In [8]:
rank = f500["rank"]
rank_desc = rank.describe()

prev_rank = f500["previous_rank"]
prev_rank_desc = prev_rank.describe()

### Method Chaining
1. Use Series.value_counts() and Series.loc to return the number of companies with a value of 0 in the previous_rank column in the f500 dataframe. Assign the results to zero_previous_rank.
2. After running your code, use the variable inspector to view each of the new variables you created.

In [9]:
zero_previous_rank = f500["previous_rank"].value_counts().loc[0]

### Using Boolean Indexing with pandas Objects
1. Create a boolean series, motor_bool, that compares whether the values in the industry column from the f500 dataframe are equal to "Motor Vehicles and Parts".
2. Use the motor_bool boolean series to index the country column. Assign the result to motor_countries.
3. After running your code, use the variable inspector to view each of the new variables you created.

In [10]:
motor_bool = f500["industry"] == "Motor Vehicles and Parts"
motor_countries = f500.loc[motor_bool, "country"]

### Using Boolean Arrays to Assign Values
- Use boolean indexing to update values in the previous_rank column of the f500 dataframe:
    - There should now be a value of np.nan where there previously was a value of 0.
    - It is up to you whether you assign the boolean series to its own variable first, or whether you complete the operation in one line.
- Create a new pandas series, prev_rank_after, using the same syntax that was used to create the prev_rank_before series.
- After running your code, use the variable inspector to compare prev_rank_before and prev_rank_after.

In [11]:
prev_rank_before = f500["previous_rank"].value_counts(dropna=False).head()
f500.loc[f500["previous_rank"] == 0, "previous_rank"] = np.nan
prev_rank_after = f500["previous_rank"].value_counts(dropna=False).head()

### Creating New Columns
1. Add a new column named rank_change to the f500 dataframe by subtracting the values in the rank column from the values in the previous_rank column.
2. Use the Series.describe() method to return a series of descriptive statistics for the rank_change column. Assign the result to rank_change_desc.
3. After running your code, use the variable inspector to view each of the new variables you created. Verify that the minimum value of the rank_change column is now greater than -500.

In [12]:
f500["rank_change"] = f500["previous_rank"] - f500["rank"]
rank_change_desc = f500["rank_change"].describe()

### Challenge: Top Performers by Country
1. Create a series, industry_usa, containing counts of the two most common values in the industry column for companies headquartered in the USA.
2. Create a series, sector_china, containing counts of the three most common values in the sector column for companies headquartered in the China.

In [18]:
industry_usa = f500[f500["country"] == "USA"]["industry"].value_counts().head(2)
sector_china = f500[f500["country"] == "China"]["sector"].value_counts().head(3)

Banks: Commercial and Savings               8
Insurance: Property and Casualty (Stock)    7
Name: industry, dtype: int64