In [15]:
# experimental: you can run this cell to see your code full-width

from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

# Final Individual Assignment: 


## Answer 2 Business Questions using Companies House API

This assignment follows exactly the same format as all of your other assignments for this course.

**As in all the other assignments you are asked to:**

- look at a dataset using python and identify a business question
- write a mini-report that answers your business question 
- accompany your report with a visualisation (table, graph, etc)

**What is different from previous assignments:**

- this project is **individual**, you are not working in pairs
- you are given a specific dataset (Companies House API)
- you are asked to answer **TWO DIFFERENT QUESTIONS**. Each question will have its own code, report and visualisation
- you need to identify questions by yourself. Ask about something that can be answered with this data. Also attempt to ask something that can be useful in a business context.
- try to keep each of the 2 questions quite different from each other (eg. do not make them reuse all of the same code or identical visualisation)
- all libraries and visualisation methods are allowed
- your report does NOT need to use any complicated statistical or programming techniques. Rather try to look into what is available in the data and find something interestin in it
- if you keep running out of your API allowance/quote (because you are making a lot of API calls), consider saving your API results in a file, and then getting more API results, and adding them to the same file.

### Example Questions: You can use these, change them or design your own

**What makes a company more fair in terms of gender equality?** Look at the data about ownership/voting  - does this data behave differently for people of different genders, ages and nationalities? hint: We can sometimes deduce a person's gender from their title (Mr, Mrs) but not always (Dr); also company house offers a limited number of options and not everyone can find their preferred title/pronoun there. You could eg. compare companies with the word X and word Y in their name, or look at some other variable to see which companies are more diverse than others.

**When's a good time to start a tourism business?** Looking at tourism-related companies, do you see patterns in years/months of creation and cessation (closing)? Are companies created at some times more likely to survive for longer? note: to get tourism-related companies, first fetch company details for many tourism words ['travel','trip','cruise',...] and then check their Standard Industrial Classification (SIC) codes.

**What is the best place in UK to set up a fintech company?** Did it change over the years? Look at locations of companies in a sector, and where they are based. Does the count of companies, or the lengths of time they survive, change over time? Does it change differently in different locations? note: to get industry's companies you could use the method described above (using many words and filtering SIC codes)

Note: these really are just examples. Feel free to look at anything that is of interest to you, and can be answered with this data

## Deliverables for each Question: Code, Mini-report, Visualisation

### Code

In your notebook please include all the **code** that you used to arrive at the conclusion. 

- It is absolutely ok to re-use your own code from one question in another question
- But please write the most important parts of code for each Question that help you to answer the question 


### Mini-report

At the end of your notebook, you should provide:

- a markdown (text) cell with your mini-report (250 words +/- 10%). For reference, 'Marking criteria' section below has around 220 words)

### Visualisation

Use the data you extracted to further your argument with visualisations:

- individual cells that will generate MAX TWO GRAPHS OR TABLES for EACH of your mini-reports (you are allowed to combine a number of graphs, if they are combined into one unit and make sense, eg. combining 3 bar charts side by side, or overlapping a line chart with a bar chart). Make sure it is clear which graph/graphs belong where and are to be marked.

Marking Criteria for each Question (same as in previous assignments)

**Business Question and Answer:**

Did you manage to find a question that can be answered with given data? As an analyst, you will often have to dive into available data and identify how it can help the business, or solve a problem, without first knowing what the problem is. Also, can you formulate a clear answer to the question you created?
 
- 50% - C - GOOD - question and answer are clear, well defined and connect with the dataset
- 60% - B - VERY GOOD - argument is clearly positioned in a business context, and attempts to provide value/insight
- 70% - A - EXCELLENT - insights are novel, actionable and the writeup is of publishable quality.

**Using the Data:**

How well did you use the data to answer your question? Your answer should be supported by what you found in the data. Briefly describe why this was the correct data, and the correct analysis to perform on it.
 
- 50% - C - GOOD - data selected is appropriate for the task, analysis is clear, the source is mentioned
- 60% - B - VERY GOOD - data analysis section advances the argument, makes a clear point and is easy to read and understand
- 70% - A - EXCELLENT - analysis is insightful, using multiple parts of the data set in a creative way

**Visualisation:**

Can you aid your argument/answer with visual clues? A graph can say a thousand words, but it is also easy to make one that is confusing, or misleading. Use simple (or highly customised) graphs to make your argument clearer.
 
- 50% - C - GOOD - graph is communicative, appropriate and similar complexity as in the notes
- 60% - B - VERY GOOD - graph is customised and combines a number of styles and types of visualisation
- 70% - A - EXCELLENT - graph is using clear visual language to make a point, adds to the argument, and is of publishable quality

**Code Quality:**

Is your code clean, readable and DRY (Don't repeat yourself)? Are you using good readable variable names? Did you clean up your code and does it not include any old/unused parts?
 
- 50% - C - GOOD - code has meaningful variable names, no needlessly repeated code
- 60% - B - VERY GOOD - also signposted, reasonably commented and cleaned up
- 70% - A - EXCELLENT - also code has a logical flow, consistency of names and granularity/size

**Code Structure:**

Is your code well structured and broken down? Just like good writing has sentences, paragraphs and chapters, good code should be split into sections. Break down your code into cells and functions. Use meaningful signposts (eg. comments, function names) to guide the reader through your code.
 
- 50% - C - GOOD - code is broken down into cells, by the code's purpose
- 60% - B - VERY GOOD - code is broken down by cell and also separated and readable. Attempts on reusing code are made
- 70% - A - EXCELLENT - code is split into functions and/or objects and can be easily reused

# API we will use: Companies House

Companies house is the official public  register of all companies in the UK. You can search for companiers, peope etc. You can also get basic information about many companies, or request detailed information about one company. You will need to create an account - probably best if you use your university email address. You can find more information on companies house website and https://developer.company-information.service.gov.uk/get-started. 

**All possible API calls you can make:**

There is a number of calls you can make from the API. Find all the details (and info about the format of answers) here: https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/reference

**Extra notes:  Standard Industrial Classification (SIC)**

company infomation includes 'industry type' as a SIC code. List of codes is here (you can load the file and get values from there) https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/527619/SIC07_CH_condensed_list_en.csv

for example:

- 72200 Research and experimental development on social sciences and humanities
- 73110 Advertising agencies
- 73120 Media representation services
- 73200 Market research and public opinion polling
- 74100 specialised design activities
- 74201 Portrait photographic activities
- 74202 Other specialist photography
- 74203 Film processing

# If you do not have one yet: creating a developer account and API  key 

You likely already have an account and key from your previous assignments - it's probably easiest and best if you reuse that account and key.

To create a new one:

- Go to 'register a user account' link https://find-and-update.company-information.service.gov.uk/signin, enter your **school email** and then click the link you received by email. 
- setup a password
- login to the page
- go to New Application on top (https://developer.company-information.service.gov.uk/manage-applications/add)
- ented a name and short description for your API project (just say it's a univeristy project) and choose **'Live' where you're asked 'environment for your application'**. Leave optional fields blank.
- go to 'View all applications' (https://developer.company-information.service.gov.uk/manage-applications)
- click on the name of your application, and then 'Create new key'
- Select these options:
-    Key name and description: here write anything, like 'python project'
-    Select the type of API client key you want to create: **REST**
- leave other options empty, and click 'Create Key'
- When done, scroll down and copy-paste your key to this notebook. key will look a bit like this: 	e3123ad12-fd44-4aad-9389-f7dccccc6788

Once you are set up:

- all the possible requests can be found here: https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/reference
- by the way: you are allowed to ask 600 requests within each five-minute period, after that period, you get another 600. And so on. If you use up all the 600 questions, API will make you wait for a few seconds/minutes.

# Some functions to get you started: 



In [16]:
# list of library for this project
%pip3 install requests
import requests
import pprint as pp
import datetime
from datetime import date 
from datetime import datetime
import math
import pandas as pd
# you might need to add a few of your own




[notice] A new release of pip available: 22.3.1 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [17]:
# function for calling the API to retrieve JSON
def call_api_with(url_extension):
    your_company_house_api_key = "9ca44c28-e54d-4c26-9bf7-6c766c53c054"

    login_headers = {"Authorization": your_company_house_api_key}
    url = f"https://api.companieshouse.gov.uk/{url_extension}"
    # above: could be eg. https://api.companieshouse.gov.uk/search/companies?q=shop&items_per_page=1
    print(f"requesting: {url}")
    # above, optional: printing, so that you see visually how many calls you are making
    res = requests.get(url, headers=login_headers)  # , verify=False)
    return res.json()

In [18]:
# test to get one company
def get_one_test_company_or_error():
    url = f"search/companies?q=shop&items_per_page=1"
    return call_api_with(url)

In [19]:
# search company with specific query / keyword
def search_for_companies_with_query(query, number_of_companies=100):
    # for simplicity round up the number of returned companies to the nearest hundred. eg. 130 becomes 200
    page_size = 100
    number_of_pages = math.ceil(number_of_companies / page_size)  # round up
    companies = []
    for page_index in range(0, number_of_pages):
        url = f"search/companies?q={query}&items_per_page={page_size}&index_start={page_index*page_size}"
        companies += call_api_with(url).get("items", [])
    return companies

In [20]:
# request to get company data based on company number
def data_for_company(company_number):
    url = f"company/{company_number}"
    return call_api_with(url)

In [21]:
# get all persons with significant control from company which contains specific keyword
def all_significant_person_in_company(company_number):
    url = f"company/{company_number}/persons-with-significant-control"
    return call_api_with(url).get("items", [])


# get all persons with significant control from company which contains specific keyword


def all_officers_in_company(company_number):
    url = f"company/{company_number}/officers"

    return call_api_with(url).get("items", [])

In [22]:
def detailed_info_about_companies_with_ids(companies_numbers):
    results = []
    for company_number in companies_numbers:
        results.append(data_for_company(company_number))
    return results

In [23]:
# you are likely to use top level functions like this one.
def detailed_info_about_companies_with_name(name, how_many=10):
    # eg. unless otherwise stated, just grab 10 companies detailed info
    companies_basic_info = search_for_companies_with_query(name, how_many)
    companies_ids = [company["company_number"] for company in companies_basic_info]
    companies = detailed_info_about_companies_with_ids(companies_ids[:how_many])
    return companies

In [76]:
# example usage.
# note: To save your quota limits, try to fetch data first, and them analyse it.
details = detailed_info_about_companies_with_name("SUMUP LIMITED", how_many=5)

requesting: https://api.companieshouse.gov.uk/search/companies?q=SUMUP LIMITED&items_per_page=100&index_start=0
requesting: https://api.companieshouse.gov.uk/company/05394570
requesting: https://api.companieshouse.gov.uk/company/14091988
requesting: https://api.companieshouse.gov.uk/company/14478034
requesting: https://api.companieshouse.gov.uk/company/13895062
requesting: https://api.companieshouse.gov.uk/company/07836562


In [77]:
pp.pprint(len(details))
pp.pprint([detail['company_name'] for detail in details])

5
['SUMUP LIMITED',
 'SUMUP FASHION LIMITED',
 'SUMUP MEDIA LIMITED',
 'SUMUP ONLINE STORES UK LTD',
 'SUMUP PAYMENTS LIMITED']


In [78]:
pp.pprint(details[0])

{'accounts': {'accounting_reference_date': {'day': '31', 'month': '10'},
              'last_accounts': {'made_up_to': '2022-10-31',
                                'period_end_on': '2022-10-31',
                                'period_start_on': '2021-11-01',
                                'type': 'micro-entity'},
              'next_accounts': {'due_on': '2024-07-31',
                                'overdue': False,
                                'period_end_on': '2023-10-31',
                                'period_start_on': '2022-11-01'},
              'next_due': '2024-07-31',
              'next_made_up_to': '2023-10-31',
              'overdue': False},
 'can_file': True,
 'company_name': 'SUMUP LIMITED',
 'company_number': '05394570',
 'company_status': 'active',
 'confirmation_statement': {'last_made_up_to': '2023-03-16',
                            'next_due': '2024-03-30',
                            'next_made_up_to': '2024-03-16',
                            'overdue':

In [79]:
# all_officers_in_company
officers = all_officers_in_company(details[0]['company_number'])
pd.DataFrame(officers)
# pp.pprint(officers)

requesting: https://api.companieshouse.gov.uk/company/05394570/officers


Unnamed: 0,officer_role,appointed_on,links,address,name,nationality,occupation,date_of_birth,country_of_residence
0,secretary,2005-03-16,{'officer': {'appointments': '/officers/Dugfkr...,"{'address_line_1': '28 Pangfield Park', 'local...","WILLIAMS, Angela Mary",,,,
1,director,2005-03-16,{'officer': {'appointments': '/officers/0VLyBm...,"{'locality': 'Coventry', 'address_line_1': '28...","WILLIAMS, Michael John",British,Accountant,"{'month': 3, 'year': 1950}",England


### End of example code

# Business Question 1:

### Business Question 1: Code:

In [40]:
# YOUR CODE HERE
raise NotImplementedError()

NotImplementedError: 

### Business Question 1: Mini-report and visualisation:

### Visualisation


### Mini-report

作为世界金融中心之一,UK是享誉全球的银行业聚集地,尤其是London更是有金融城之称.于是我提出了我的Business Question:哪个UK城市最适合建立新银行?

为了探究这一问题,我使用Company House API的Advanced search搜索所有2015年后建立的sic code等于64191(sic code for Banks)的企业,以获得精准的搜索结果来制作分年份显示的可互动地图散点图.

我们的结果非常有意思:



# Business Question 2:

### Business Question 2: Code:

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Business Question 2: Mini-report and visualisation:

double-click to edit
