# Investigating the frequency of newly incorporated companies with the same post code in the UK

<br>

In [1]:
# importing numpy.
import numpy as np

# importing pandas.
import pandas as pd

# importing regular expressions.
import re

***

<br>

##### Reading in 500 companies incorporated in 2022 from Companies House

First 500 companies incorporated in the UK in 2022.  Search function available [here](https://find-and-update.company-information.service.gov.uk/advanced-search) [1]. 

In [2]:
data = "data\comp_house-2022.csv"

In [3]:
# Reading in CSV exported from Companies House.
comp_500 = pd.read_csv(data)

***

<br>

#### Creating Regex for post code

The regular expression for UK post codes was obtained [here](https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation) [2]. 

In [4]:
re_post = "([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|\
(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|\
([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})"

***

<br>

#### Searching post codes within the address

Running regular expression on addresses and adding post code to list

In [5]:
post = []

for i in comp_500["registered_office_address"]:
    match = re.search(re_post, i).group(0)
    post.append(match)

***

<br>

#### Adding post code as seperate column

In [6]:
comp_500["Post Code"] = post

***

<br>

#### Adding frequency of the post code as a column

In [7]:
# count of the 'Post Code' column and returning the count as a 'frequency' column.  Transform function applies the count to each row. [3]
comp_500['PO frequency'] = comp_500.groupby("Post Code")["Post Code"].transform('count')

***

<br>

#### Displaying dataframe

Displaying top 50 rows in decending order by 'frequency'.

In [8]:
comp_500.sort_values('PO frequency', ascending=False).head(50)

Unnamed: 0,company_name,company_number,company_status,company_type,dissolution_date,incorporation_date,nature_of_business,registered_office_address,Post Code,PO frequency
307,121 PROPERTY SOLUTIONS LIMITED,13825040,Active,Private limited company,,03/01/2022,68320,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15
129,ADELE & GIUSEPPE LTD,13824537,Active,Private limited company,,01/01/2022,56103,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15
218,STAY SAFE GROUP LTD,13825222,Active,Private limited company,,03/01/2022,70229 80100,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15
305,ACE INFINITY LTD,13825010,Active,Private limited company,,03/01/2022,93290,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15
306,BRIGHTER CLUB LTD,13825044,Active,Private limited company,,03/01/2022,86900,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15
301,LNTJ HOLDINGS LTD,13824990,Active,Private limited company,,03/01/2022,64209,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15
309,23HEATING LTD,13825025,Active,Private limited company,,03/01/2022,43220,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15
311,J10XJJ LIMITED,13825011,Active,Private limited company,,03/01/2022,59112,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15
492,TECHTASTIK LTD,13831787,Active,Private limited company,,06/01/2022,47429,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15
299,ACCEPTICON LTD,13825000,Active,Private limited company,,03/01/2022,47910 82990,86-90 Paul Street London EC2A 4NE,EC2A 4NE,15


***

<br>

## References

1. https://find-and-update.company-information.service.gov.uk/advanced-search
2. https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
3. https://stackoverflow.com/questions/22391433/count-the-frequency-that-a-value-occurs-in-a-dataframe-column

***

# End