### `Putting it all together: Building a value-weighted index`
#### `01:Explore and clean company listing information`
To get started with the construction of a market-value based index, you'll work with the combined listing info for the three largest US stock exchanges, the NYSE, the NASDAQ and the AMEX.

In this and the next exercise, you will calculate market-cap weights for these stocks.

We have already imported pandas as pd, and loaded the listings data set with listings information from the NYSE, NASDAQ, and AMEX. The column 'Market Capitalization' is already measured in USD mn.


- Inspect `listings` using `.info()`.
- Move the column `'Stock Symbol'` into the index (`inplace`).
- Drop all companies with missing `'Sector'` information from `listings`.
- Select companies with IPO Year before 2019.
- Inspect the result of the changes you just made using `.info()`.
- Show the number of companies per `'Sector'` using `.groupby()` and `.size()`. Sort the output in descending order.

In [24]:
import pandas as pd 

In [25]:
listings = pd.read_excel('datasets/listings_agg.xlsx',  sheet_name=0, index_col=0)
listings

Unnamed: 0,Exchange,Stock Symbol,Company Name,Last Sale,Market Capitalization,IPO Year,Sector,Industry
0,amex,XXII,"22nd Century Group, Inc",1.33,120.628,,Consumer Non-Durables,Farming/Seeds/Milling
1,amex,FAX,Aberdeen Asia-Pacific Income Fund Inc,5.00,1266.333,1986.0,,
2,amex,IAF,Aberdeen Australia Equity Fund Inc,6.15,139.865,,,
3,amex,CH,"Aberdeen Chile Fund, Inc.",7.22,67.563,,,
4,amex,ABE,Aberdeen Emerging Markets Smaller Company Oppo...,13.36,128.843,,,
...,...,...,...,...,...,...,...,...
66,69 nyse,ZB^H,Zions Bancorporation,25.30,0.000,,,
66,70 nyse,ZBK,Zions Bancorporation,28.86,0.000,,Finance,Major Banks
66,71 nyse,ZOES,"Zoe&#39;s Kitchen, Inc.",17.07,332.556,2014.0,Consumer Services,Restaurants
66,72 nyse,ZTS,Zoetis Inc.,53.10,26105.443,2013.0,Health Care,Major Pharmaceuticals


In [26]:
# Inspect listings
print(listings.info())

# Move 'stock symbol' into the index
listings.set_index('Stock Symbol', inplace=True)

# Drop rows with missing 'sector' data
listings.dropna(subset=['Sector'], inplace=True)


# Select companies with IPO Year before 2019
listings = listings[listings["IPO Year"] < 2019]

# Inspect the new listings data
print(listings.info())

# Show the number of companies per sector
print(listings.groupby('Sector').size().sort_values(ascending=False))


<class 'pandas.core.frame.DataFrame'>
Int64Index: 6674 entries, 0 to 66
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Exchange               6674 non-null   object 
 1   Stock Symbol           6674 non-null   object 
 2   Company Name           6674 non-null   object 
 3   Last Sale              6590 non-null   float64
 4   Market Capitalization  6674 non-null   float64
 5   IPO Year               2852 non-null   float64
 6   Sector                 5182 non-null   object 
 7   Industry               5182 non-null   object 
dtypes: float64(3), object(5)
memory usage: 469.3+ KB
None
<class 'pandas.core.frame.DataFrame'>
Index: 2349 entries, ACU to ZTO
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Exchange               2349 non-null   object 
 1   Company Name           2349 non-null   object 
 2   La

In [27]:
listings.groupby('Sector').size().sort_values(ascending=False)

Sector
Health Care              445
Consumer Services        402
Technology               386
Finance                  351
Energy                   144
Capital Goods            143
Public Utilities         104
Basic Industries         104
Consumer Non-Durables     89
Miscellaneous             68
Transportation            58
Consumer Durables         55
dtype: int64

`02: Select and inspect index components`
Now that you have imported and cleaned the listings data, you can proceed to select the index components as the largest company for each sector by market capitalization.

You'll also have the opportunity to take a closer look at the components, their last market value, and last price.


We have already imported `pandas` as `pd`, and loaded the `listings` data with the modifications you made during the last exercise.

- Use `.groupby()` and `.nlargest()` to select the largest company by `'Market Capitalization'` for each `'Sector'`, and assign the result to `components`.
- Print `components`, sorted in descending order by market cap.
- Select `Stock Symbol` from the `index` of `components`, assign it to `tickers` and print the result.
- Create a list `info_cols` that holds the column names `Company Name`, `Market Capitalization`, and `Last Sale`. Next, use `.loc[]` with `tickers` and `info_cols` to `print()` more details about the listings sorted in descending order by `Market Capitalization`)

In [28]:
listings = pd.read_excel('datasets/listings_agg.xlsx',  sheet_name=0, index_col=0)
listings

Unnamed: 0,Exchange,Stock Symbol,Company Name,Last Sale,Market Capitalization,IPO Year,Sector,Industry
0,amex,XXII,"22nd Century Group, Inc",1.33,120.628,,Consumer Non-Durables,Farming/Seeds/Milling
1,amex,FAX,Aberdeen Asia-Pacific Income Fund Inc,5.00,1266.333,1986.0,,
2,amex,IAF,Aberdeen Australia Equity Fund Inc,6.15,139.865,,,
3,amex,CH,"Aberdeen Chile Fund, Inc.",7.22,67.563,,,
4,amex,ABE,Aberdeen Emerging Markets Smaller Company Oppo...,13.36,128.843,,,
...,...,...,...,...,...,...,...,...
66,69 nyse,ZB^H,Zions Bancorporation,25.30,0.000,,,
66,70 nyse,ZBK,Zions Bancorporation,28.86,0.000,,Finance,Major Banks
66,71 nyse,ZOES,"Zoe&#39;s Kitchen, Inc.",17.07,332.556,2014.0,Consumer Services,Restaurants
66,72 nyse,ZTS,Zoetis Inc.,53.10,26105.443,2013.0,Health Care,Major Pharmaceuticals


In [32]:
components = listings.groupby(['Sector'])['Market Capitalization'].nlargest(1)
components

Sector                   
Basic Industries       58    230159.644
Capital Goods          63    155660.252
Consumer Durables      35     48398.936
Consumer Non-Durables  42    183655.305
Consumer Services      36    422138.531
Energy                 46    338728.714
Finance                50    300283.250
Health Care            51    338834.390
Miscellaneous          36    275525.000
Public Utilities       38    247339.517
Technology             36    740024.467
Transportation         64     90180.887
Name: Market Capitalization, dtype: float64