Here is an outline of the project
1. Why Clarendon Analysis
   * About the Clarendon
   * Define Project Motivations
   * Define Project Goals.


2. Get the Data.
    * Scrape the data from `https://www.ox.ac.uk/clarendon/scholar-class-lists/scholars-2020-21` 
    * Save the data into a data frame
    
    
    
3. Analysis of the data
   * Determine the top ten country with the highest scholars
   * College Top ten Scholars by College
   * Top ten course with the highest number of Scholars
   * Plot a visual of top ten Scholars and courses
   * Determine the number of African Scholars
   * Determine the number of Nigerian Scholars

   * Plot a visual of Scholars and courses
   

# 1. Why Clarendon Analysis

After few years of graduation from undergraduates studies, I have continually searched for scholarships and funding opportunities to further my studies. Never heard of this particular one. It was posted on linkedin by a third party connect on Linkedin. It piqued my curiosity. I did some digging on their website, found some helpful information and  the list of previous scholars. I became even more curious to learn the number of Nigerias or Africans that have enjoyed the scholarship since it began. A good opportunity to apply my programming skills to find answers to all my curiosity. Thus the need for this project. 


Clarendon not only offers over 150 new, fully-funded scholarships each year to assist outstanding graduate scholars, but offers the opportunity to join one of the most active, highly international, and multidisciplinary communities at Oxford.

Originally established to support Overseas students, the Clarendon Fund first welcomed scholars to Oxford in 2001. The scheme was expanded in 2012 to include students from the UK and EU, therefore providing funding for all fee statuses. Throughout this period, the Fund’s aim has remained unchanged; to assist academically outstanding graduate students through their studies at the University of Oxford.

# 2. Getting the Data
You can find the data used from this analysis [here](https://www.ox.ac.uk/clarendon/scholar-class-lists/scholars-2020-21)

In [79]:
import pandas as pd
import numpy as np
import requests
import seaborn as sns
import plotly.express as px

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"}

from bs4 import BeautifulSoup

In [67]:
url = "https://www.ox.ac.uk/clarendon/scholar-class-lists"

In [85]:
html_doc = requests.get(url, headers=headers).text
clarendon_soup = BeautifulSoup(html_doc, "html5lib")


In [86]:
# Get the all page content
page_content = clarendon_soup.find("nav", id="block-menu-block-10")


In [70]:
# Grab the list to all the page content
all_pages = page_content.find_all("li")
all_pages

[<li class="first leaf menu-mlid-8570"><a href="/clarendon/scholar-class-lists/scholars-2020-21">Scholars 2020-21</a></li>,
 <li class="leaf menu-mlid-8573"><a href="/clarendon/scholar-class-lists/scholars-2019-20">Scholars 2019-20</a></li>,
 <li class="leaf menu-mlid-9405"><a href="/clarendon/scholar-class-lists/scholars-2018-19">Scholars 2018-19</a></li>,
 <li class="leaf menu-mlid-8883"><a href="/clarendon/scholar-class-lists/scholars-2017-18">Scholars 2017-18</a></li>,
 <li class="leaf menu-mlid-5679"><a href="/clarendon/scholar-class-lists/scholars-2016-17">Scholars 2016-17</a></li>,
 <li class="last leaf menu-mlid-8582"><a href="/clarendon/scholar-class-lists/previous-scholars">Previous scholars</a></li>]

In [71]:
# Function that generate all the scholars from the web page page
def generate_df(pages):
    clanderon_df = pd.DataFrame(columns=["Name", "Country", "Course", "College"])
    for i in range(len(all_pages)):
        query = "-".join(all_pages[i].string.split(" "))
        page_url = f"https://www.ox.ac.uk/clarendon/scholar-class-lists/{query}"
        
        page = requests.get(page_url, headers=headers).text
        soup = BeautifulSoup(page, "html5lib")
        
        for row in soup.find("tbody").find_all("tr"):
            col = row.find_all("td")
            name = col[0].text
            country = col[1].text
            course = col[2].text
            college = col[3].text
            
            clanderon_df = clanderon_df.append({"Name": name, "Country":country, "Course":course, "College":college}, 
                                               ignore_index=True)
    return clanderon_df

In [72]:
scholars = generate_df(all_pages)

In [73]:
# Drop the Name column for privacy reasons.
scholars_df = scholars.drop(columns="Name", axis=1)

In [87]:
# Save the clarendon/scholar-class-lists to file
scholars_df.to_excel("data/clarendon-scholars.xlsx", index=False)

## 3. Analysis of the Data

We will try to understand the data and answer the project questions by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed. I will be using the `seaborn` library based on Matplotlib. It provides a high-level interface for creating attractive graphs.

In [89]:
#Load the clarendon scholar
scholar_data = pd.read_excel("data/clarendon-scholars.xlsx")

#### (a) Group Scholars by Country

In [90]:
#@title Correct spelling mistakes, and replace United State of America with USA
scholar_data["Country"] = scholar_data["Country"].str.replace("UnitedStatesofAmerica", "United States of America")\
.str.replace("England", "United Kingdom")
scholar_data

Unnamed: 0,Country,Course,College
0,United States of America,DPhil in Sociology (PT),Nuffield College
1,India,DPhil in Computer Science,Exeter College
2,India,DPhil in International Development,Lincoln College
3,Egypt,DPhil in Economics,Nuffield College
4,Australia,"DPhil in Physiology, Anatomy and Genetics",Keble College
...,...,...,...
779,China,DPhil in Education,Lady Margaret Hall
780,China,DPhil in Management Studies,Exeter College
781,China,DPhil in Engineering Science,Balliol College
782,United States of America,MSc in Contemporary Chinese Studies,Brasenose College


In [91]:
scholar_by_country = scholar_data["Country"].value_counts()[:10]

In [92]:

def create_barchart(df, ylabel, title):
    fig = px.bar(df, x=df.values, y=df.index, text=df.values, orientation="h")
    fig.update_traces(textposition='inside')
    fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
    fig.update_layout(
    title_text=title,
    xaxis={ "title": "Number of scholars",  "showticklabels": False, "showgrid":False},
    yaxis_title = ylabel,
    yaxis={"categoryorder": "total ascending"},
    template='plotly_white',

    )

    fig.show()
    
create_barchart(scholar_by_country, "Country", "Clarendon Scholars: Top Ten Countries")

**b. Clarendon Scholars by College**

In [81]:
scholar_count_college = scholar_data["College"].value_counts()[:5]
create_barchart(scholar_count_college, "Colleges", "Clarendon Scholars: Top Five Colleges")

**c. Clarendon Scholars by Course**

In [82]:
scholar_count_course = scholar_data["Course"].value_counts()[:5]
create_barchart(scholar_count_course, "Courses", "Clarendon Scholars: Top Five Courses")

Since inception to 2020, `Dphil in Clinical Medicine` course have the highest number of scholars. `25`

**c. Clarendon Scholars of African Decent**