# Homework 3

## Question 1

Fidelity Investments offers a number of "sector" mutual funds, as shown on [this page](https://fundresearch.fidelity.com/mutual-funds/category-performance-annual-total-returns/SECTOR).

Write code to download this page and extract the following variables:
* Fund name
* Fund ticker symbol
* Fund inception date
* Investment category

Store the records in a list of tuples containing the fund name, ticker, and the year of fund inception. For example, your list should have records like this:

`[('Consumer Cyclical', 'Consumer Discretionary', 'FSCPX', 1990),
 ('Consumer Cyclical', 'Leisure', 'FDLSX', 1984),
 ('Consumer Cyclical', 'Retailing', 'FSRPX', 1985), ...`
 

Notice that extraneous words like "Fidelty Select" and "Portfolio", which are common to all the funds, have been removed from fund names. Note also that the inception year is an integer.

In [None]:
import requests, re

url = 'https://fundresearch.fidelity.com/mutual-funds/category-performance-annual-total-returns/SECTOR'
req = requests.get(url)
html = req.text
ptrn = r'href="https://fundresearch.fidelity.com/mutual-funds/summary/\d+">Fidelity (?:Select )*(.*?) (?:Fund)*(?:Portfolio)* \((\w+)\).*?<td align="center">\d+/\d+/(\d+)</td>\n\t\t\t\t\t<td>(.*?)</td>'
ptrns = re.findall(ptrn, html, re.DOTALL)

funds = []

for i in ptrns:
    funds.append(tuple([i[3],i[0],i[1],int(i[2])]))
    
funds    

How many funds are there? Your answer should come from a single line of code.

In [None]:
len(funds)

## Question 2

Now process the records in your dictionary to create a new dictionary with *sector* as the key, and a list of fund tickers as the values.

Your result should look like this:

`{'Consumer Cyclical': ['FSCPX', 'FDLSX', 'FSRPX', 'FSHOX', 'FSAVX', 'FBMPX'],
 'Consumer Defensive': ['FDFAX'], ...`

In [None]:
sectors = {}

for fund in funds:
    category = fund[0]
    ticker = fund[2]
    if category not in sectors:
        sectors[category] = []
    sectors[category].append(ticker)

sectors

How many sectors are there? Again, answer with just one line of code.

In [None]:
len(sectors)

## Question 3

Print a table of all years that have at least one fund inception, sorted from oldest to newest, and a count of how many funds were started that year.

Hint: It will probabably help to start by constructing a new dictionary.

In [None]:
import pandas as pd

years = {}

for fund in funds:
    year = fund[3]
    if year not in years:
        years[year] = int(0)
    years[year] += 1

years = pd.DataFrame.from_dict(years, orient = 'index', columns = ['Number of Funds Started'])
years = years.sort_index(ascending = True)

print(years)

## Question 4

Morningstar provides detailed information about mutual funds, some available for free. 

[This page](https://portfolios.morningstar.com/fund/holdings?t=FBSOX&region=usa&culture=en-US) displays holdings information for the top 25 stocks held by one of the funds.

If you look at the page source, though, you won't find the data there -- it's actually coming from another web server. The following URL returns a string of JSON-formatted text that contains the information that is being used to create the other page:

https://portfolios.morningstar.com/portfo/fund/ajax/holdings_tab?&t=FBSOX

To get started, write code to download this JSON and convert it to a dictionary called `jsn` just for this one ticker to get started.

In [None]:
import json

url = 'https://portfolios.morningstar.com/portfo/fund/ajax/holdings_tab?&t=FBSOX'
req = requests.get(url)
txt = re.search(r'.*',req.text,re.DOTALL).group(0)
jsn = json.loads(txt)
html = jsn['htmlStr']

Now, `jsn` has a key `htmlStr` that contains the HTML with the information we want. Look at this HTML and the web page in your browser to come up with a regular expression you can use to extract the *ticker* and *company name* for all of the 25 holdings listed.

Now, write code that extracts this information for each of the funds. When you're done you should have a dictionary where the key is the fund ticker and the values are tuples of company ticker and name. For example, your dictionary should have a list that starts like this for the key `FBSOX`:

`[('ACN', 'Accenture PLC A'),
 ('EPAM', 'EPAM Systems Inc'),
 ('IBM', 'International Business Machines Corp'),
 ('ADBE', 'Adobe Inc'), ...`
 
Note: Be careful to check that you end up with 25 unique holdings for a fund. If you end up with duplicate matches, recall that a `set` can only contain unique values, and you can convert a `list` to a set by calling `set(my_list)`.

In [None]:
def get_fund_underlyings(fund_ticker):

    import json

    url = f'https://portfolios.morningstar.com/portfo/fund/ajax/holdings_tab?&t={fund_ticker}'
    req = requests.get(url)
    txt = re.search(r'.*',req.text,re.DOTALL).group(0)
    jsn = json.loads(txt)
    html = jsn['htmlStr']
    company_info = re.findall(r'href="\t\t\t//quotes.morningstar.com/stock/(\w+)/.*?\n">(.*?)</a><\/th>\n',html,re.DOTALL)

    companies  = []

    for company in company_info:
        ticker = company[0]
        name = company[1]
        if ticker not in companies:
            companies.append((ticker,name))
        
    return companies

In [None]:
fund_underlyings = {}

for fund in funds:
    fund_ticker = fund[2]
    fund_underlyings[fund_ticker] = get_fund_underlyings(fund_ticker)
    
fund_underlyings