Here is an outline of the project
1. Why Clarendon Analysis
   * About the Clarendon
   * Define Project Motivations
   * Define Project Goals.


2. Get the Data.
    * Scrape the data from `https://www.ox.ac.uk/clarendon/scholar-class-lists/scholars-2020-21` 
    * Save the data into a data frame
    
    
    
3. Analysis of the data
   * Determine the country with the highest scholars
   * Determine the number of African Scholars
   * Determine the number of Nigerian Scholars
   * College with the highest number of Scholars
   * Plot a visual of Scholars and courses
   

# 1. Why Clarendon Analysis

After few years of graduation from undergraduates studies, I have continually searched for scholarships and funding opportunities to further my studies. Never heard of this particular one. It was posted on linkedin by a third party connect on Linkedin. It piqued my curiosity. I did some digging on their website, found some helpful information and  the list of previous scholars. I became even more curious to learn the number of Nigerias or Africans that have enjoyed the scholarship since it began. A good opportunity to apply my programming skills to find answers to all my curiosity. Thus the need for this project. 


Clarendon not only offers over 150 new, fully-funded scholarships each year to assist outstanding graduate scholars, but offers the opportunity to join one of the most active, highly international, and multidisciplinary communities at Oxford.

Originally established to support Overseas students, the Clarendon Fund first welcomed scholars to Oxford in 2001. The scheme was expanded in 2012 to include students from the UK and EU, therefore providing funding for all fee statuses. Throughout this period, the Fund’s aim has remained unchanged; to assist academically outstanding graduate students through their studies at the University of Oxford.

# 2. Getting the Data
You can find the data used from this analysis [here](https://www.ox.ac.uk/clarendon/scholar-class-lists/scholars-2020-21)

In [17]:
import pandas as pd
import numpy as np
import requests
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"}

from bs4 import BeautifulSoup

In [18]:
url = "https://www.ox.ac.uk/clarendon/scholar-class-lists"

In [19]:
html_doc = requests.get(url, headers=headers).text
soup = BeautifulSoup(html_doc, "html5lib")


In [20]:
# Get the all page content
page_content = soup.find("nav", id="block-menu-block-10")
page_content


<nav aria-label="Tertiary navigation" class="block block-menu-block block-menu-block-10" id="block-menu-block-10">

  
        <h2 class="title"><a class="active-trail active-trail active" href="/clarendon/scholar-class-lists">Scholar class lists</a></h2>
    
  <ul><li class="first leaf menu-mlid-8570"><a href="/clarendon/scholar-class-lists/scholars-2020-21">Scholars 2020-21</a></li>
<li class="leaf menu-mlid-8573"><a href="/clarendon/scholar-class-lists/scholars-2019-20">Scholars 2019-20</a></li>
<li class="leaf menu-mlid-9405"><a href="/clarendon/scholar-class-lists/scholars-2018-19">Scholars 2018-19</a></li>
<li class="leaf menu-mlid-8883"><a href="/clarendon/scholar-class-lists/scholars-2017-18">Scholars 2017-18</a></li>
<li class="leaf menu-mlid-5679"><a href="/clarendon/scholar-class-lists/scholars-2016-17">Scholars 2016-17</a></li>
<li class="last leaf menu-mlid-8582"><a href="/clarendon/scholar-class-lists/previous-scholars">Previous scholars</a></li>
</ul>

  



  
</nav>

In [21]:
# Grab the list to all the page content
all_pages = page_text.find_all("li")
all_pages

[<li class="first leaf menu-mlid-8570"><a href="/clarendon/scholar-class-lists/scholars-2020-21">Scholars 2020-21</a></li>,
 <li class="leaf menu-mlid-8573"><a href="/clarendon/scholar-class-lists/scholars-2019-20">Scholars 2019-20</a></li>,
 <li class="leaf menu-mlid-9405"><a href="/clarendon/scholar-class-lists/scholars-2018-19">Scholars 2018-19</a></li>,
 <li class="leaf menu-mlid-8883"><a href="/clarendon/scholar-class-lists/scholars-2017-18">Scholars 2017-18</a></li>,
 <li class="leaf menu-mlid-5679"><a href="/clarendon/scholar-class-lists/scholars-2016-17">Scholars 2016-17</a></li>,
 <li class="last leaf menu-mlid-8582"><a href="/clarendon/scholar-class-lists/previous-scholars">Previous scholars</a></li>]

In [26]:
# Function that generate all the scholars from the content page
def generate_dataframe(pages):
    df = pd.DataFrame()
    for i in range(len(all_pages)):
        query = "-".join(all_pages[i].string.split(" "))
        page_url = f"https://www.ox.ac.uk/clarendon/scholar-class-lists/{query}"

        page = requests.get(page_url, headers=headers).text
        doc = BeautifulSoup(page, "html5lib")

        tables1 = doc.find_all("table", class_="sort-table table-striping")
        df  = pd.read_html(str(tables1), flavor="bs4")
    return df
    




In [27]:
# List of all scholars to date
list_of_scholars = generate_dataframe(all_pages)

In [28]:
len(list_of_scholars)

15

In [33]:
# verify the total numbers of scholars to date
total_scholars = sum([len(list_of_scholars[i]) for i in range(len(list_of_scholars))])
total_scholars

612

In [35]:
# Create scholar Dataframe for further analysis
scholars_df = pd.concat(list_of_scholars, ignore_index=True)
scholars_df.head()

Unnamed: 0,Name,Country,Course,College
0,Aayush Srivastava,India,DPhil in Geography and the Environment,Hertford College
1,Abhishek Raman Parajuli,Nepal,MPhil in Politics: Comparative Government,Nuffield College
2,Abigail D'Cruz,Australia,DPhil in Clinical Medicine,Keble College
3,Adam Prosinski,Poland,DPhil in Partial Differential Equations: Analy...,St John's College
4,Aisha Ahmad,Pakistan,MPhil in Development Studies,St Cross College


In [40]:
# Save the clarendon/scholar-class-lists to file
scholars_df.to_excel("clarendon-scholar-class-lists.xlsx", index=False)

In [None]:
##