## Project 01: Exploring Wikipedia Page Metrics

- **Due**: Thursday, 27 September 2018; 12:00pm
- **Total Points**: 70
    - correctly parsed list of authors, 15 points
    - working functions, 20 points
    - save and upload CSV file, 5 points
    - interactive graphics, 15 points
    - analysis, 15 points

In this project, you'll work on a task similar to Tutorial-11 by
extracting page metrics from a set of Wikipedia pages and analyzing
the results. Please pick one of the following lists of pages to work
with (I've picked these because the format is similiar to the American
novelists; if you want to do something different just ask!):

- https://en.wikipedia.org/wiki/List_of_Irish_writers
- https://en.wikipedia.org/wiki/List_of_Chinese_writers
- https://en.wikipedia.org/wiki/List_of_French-language_authors
- https://en.wikipedia.org/wiki/List_of_Indian_writers
- https://en.wikipedia.org/wiki/List_of_Black_British_writers

Once you have picked your set of pages complete sections (1)-(5)
below. Submit the Jupyter notebook within the project01 directory
on GitHub along with your CSV file (please do not upload all of the JSON
files). Also, make sure to upload the fully executed version of the notebook.

### Section 01: Grabbing the data

In code blocks below (please add or delete blocks as needed), grab all of the
Wikipedia pages for the writers on your list. Pay attention to any issues that
arise in this process, such as needing the truncate the start or end of the list.
Note that you will need to copy the `wiki.py` file into the directory where you
put your project code.

In [None]:
import wiki

### Section 02: Defining functions

Here, unlike tutorial 11, we will develop a set of functions to help parse our
Wikipedia data. I've filled in some of the details for you, but you'll need to
fill in the details. 

Note that by default the functions will run, but return the same result for every
page. The next section contains some code to help test you results.

Also note that you should not need to modify `links_to_dataframe` or `wiki_page_metrics`,
but you will need to change the remaining functions.

In [None]:
def links_to_dataframe(links):
    """Takes a list of links and returns a pandas data frame of metrics.
    
    You should not need to modify this function. It takes a list of all
    the links that you want to study and returns a single pandas data frame
    with all of the variables of interest.
    
    Args:
        links: List of character strings indicating the pages to study.
    Returns:
        A pandas.DataFrame object with data about each page.
    """
    
    # Create initial empty lists
    author_name = []
    num_langs = []
    num_links = []
    num_chars = []
    num_elinks = []
    num_images = []
    num_sections = []
    
    # Fill data for each links
    import wiki
    
    for link in links:
        data = wiki.get_wiki_json(link)
        metrics = wiki_page_metrics(data)
        author_name.append(metrics['author_name'])
        num_langs.append(metrics['num_langs'])
        num_links.append(metrics['num_links'])
        num_chars.append(metrics['num_chars'])
        num_elinks.append(metrics['num_elinks'])
        num_images.append(metrics['num_images'])
        num_sections.append(metrics['num_sections'])
        
    # Convert lists to DataFrame and return results
    import collections

    df = collections.OrderedDict()
    df['author_name'] = author_name
    df['url'] = links
    df['num_langs'] = num_langs
    df['num_links'] = num_links
    df['num_chars'] = num_chars
    df['num_elinks'] = num_elinks
    df['num_images'] = num_images
    df['num_sections'] = num_sections

    import pandas as pd
    df = pd.DataFrame(df)
    
    return df

In [None]:
def wiki_page_metrics(data):
    """Takes JSON page data and returns notablility metrics from the page.
    
    You should not need to modify this function. It calls all of the other
    metric functions, which you will need to modify and create.
    
    Args:
        data: The raw page data from the function `wiki.get_wiki_json`.
    Returns:
        A dictionary containing metric data values.
    """
    metrics = {
        'author_name': page_title(data),
        'num_langs': page_num_langs(data),
        'num_links': page_num_links(data),
        'num_chars': page_num_chars(data),
        'num_elinks': page_num_elinks(data),
        'num_images': page_num_images(data),
        'num_sections': page_num_sections(data)
    }
    
    return metrics

In [None]:
def page_title(data):
    """Extract title from JSON data.
    """
    return 'Taylor Arnold'

In [None]:
def page_num_langs(data):
    """Extract number of language links from JSON data.
    """
    return 1

In [None]:
def page_num_links(data):
    """Extract number of internal links from JSON data.
    """
    return 1

In [None]:
def page_num_chars(data):
    """Extract length of text in characters from JSON data.
    """
    return 1

In [None]:
def page_num_elinks(data):
    """Extract number of external links from JSON data.
    """
    return 1

In [None]:
def page_num_images(data):
    """Extract number of images from JSON data.
    """
    return 1

In [None]:
def page_num_sections(data):
    """Extract number of images from JSON data.
    """
    return 1

### Section 03: Test your code

Use the code below to test whether you script works on these three American
Novelists.

In [None]:
links_to_dataframe(['Mark_Twain', 'John_Updike', 'Margaret_Walker'])

### Section 04: Save results as CSV file

Using the functions defined above, extract metadata for your list of authors
using the `links_to_dataframe` function and save the results as a CSV file in
the code below. You can call the file anything reasonable.

### Section 05: Graphics and analysis

In code blocks below, produce two interactive graphics. Above each graphic, provide
a markdown cell that includes 4-5 sentences describing something interesting you found
from each graphic.