### **Number of Registered/Certified Businesses**

This dataset contains information on certified businesses, including details on their industry categories, types of certifications, employee 
sizes, small business statuses, and geographical locations. The certifications tracked include minority-owned (MBE), women-owned (WBE), 
veteran-owned, and SDO-certified businesses, reflecting the diversity and specialized certifications of businesses within various industries. 
Additionally, the dataset captures information on small local business certifications and categorizes businesses by industry and service type.

The goal of this analysis is to provide actionable insights into the landscape of certified businesses, offering a foundation for understanding 
diversity, local business representation, and certification trends across different industries. The dataset can be accessed here: https://data.boston.gov/dataset/certified-business-directory


#### **1. Loading necessary libraries**

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import geopandas as gpd
from shapely.geometry import Point

#### **2. Performing Exploratory Data Analysis**

In [2]:
df = pd.read_csv("certified_businesses.csv")
df.head()

Unnamed: 0,company_name,services_provided,mbe_wbe_cert,small_local_cert,veteran_cert,sdo_certified,city_registered,address,city,state,...,number_employees,cob_category_codes1,cob_category_codes2,cob_category_codes3,naics_codes1,naics_codes2,naics_codes3,unspsc_code1,unspsc_code2,unspsc_code3
0,"9THAI EXPRESS , LLC",FULL SERVICES OF RESTAURANT,MBE,,,No,No,"433 Cambridge Street, FIRST FLOOR., Boston, MA...",Boston,MA,...,Less than 10,,,,,,,,,
1,Turning On the Lights Global Institute,Digital Coaching Service for Small Business Ow...,MWBE,,,No,No,"6 LIBERTY SQ, Suite 3013, Boston, MA 02109",Boston,MA,...,Less than 10,TA - Training (see also ED),BU - Business Management Consultants,MK - Marketing,611430 - Professional and Management Developme...,541613 - Marketing Consulting Services,,86130000 - Specialized educational services,86120000 - Educational institutions,
2,American Dream Home Care Agency LLC,"Personal care attended , educations",WBE,,,No,No,"33 Dover Street, Brockton, MA 2301",Brockton,MA,...,11 - 20,"AHS - Health Care (Providers, Services)",,,621610 - Home Health Care Services,,,85100000 - Comprehensive health services,,
3,Enlightened Inc,An Information Technology Consulting Firm Spec...,MBE,,,No,Yes,"1205 Marion Barry Avenue SE, Suite 300, Washin...",Washington,DC,...,Greater than 100,COM - Consultants: Management,COS - Consultants: Systems Analysis,,541511 - Custom Computer Programming Services,541512 - Computer Systems Design Services,541519 - Other Computer Related Services,81110000 - Computer services,43230000 - Software,81160000 - Information Technology Service Deli...
4,The Little Cocoa Bean Co.,Baby and Toddler Food; Baby and Toddler Access...,MWBE,SLBE,,No,No,"112 South Street, Boston, MA 02111",Boston,MA,...,Less than 10,"FD - Food Products, Services",AL - Apparel,,"424330 - Women's, Children's, and Infants' Clo...",424490 - Other Grocery and Related Products Me...,,50190000 - Prepared and preserved foods,53100000 - Clothing,


In [3]:
zip_codes = ['02122', '02124', '02125', '02115', '02215', '02119', '02120', '02121', '02118'] #district 7 zipcodes

# Filter the dataset for rows where the 'zipcode' column matches one of the specified zip codes
df_district7 = df[df['zipcode'].astype(str).isin(zip_codes)]

# Display a sample of the filtered dataset
df_district7.head()

Unnamed: 0,company_name,services_provided,mbe_wbe_cert,small_local_cert,veteran_cert,sdo_certified,city_registered,address,city,state,...,number_employees,cob_category_codes1,cob_category_codes2,cob_category_codes3,naics_codes1,naics_codes2,naics_codes3,unspsc_code1,unspsc_code2,unspsc_code3
30,Audio and Lighting Unlimited,Audio and lighting equipment rental,MBE,,,No,Yes,"169 Norfolk Avenue, Boston, MA 02119",Boston,MA,...,Less than 10,AD - Advertising/Audovisual Graphic Design/Mar...,BU - Business Management Consultants,COT - Consultants: Telecommunications,532490 - Other Commercial and Industrial Machi...,,,45110000 - Audio and visual presentation and c...,83110000 - Telecommunications media services,43200000 - Components for information technolo...
38,"Black Owned Bos, LLC","Retail, Marketing, and Consulting",MWBE,SLBE,,No,Yes,"623 Tremont Street, Boston, MA 02118",Boston,MA,...,Less than 10,COM - Consultants: Management,AD - Advertising/Audovisual Graphic Design/Mar...,RT - Retail Sales of Art & Framing,541613 - Marketing Consulting Services,"453220 - Gift, Novelty, and Souvenir Stores",,80140000 - Marketing and distribution,,
39,"Bloom Architecture, Inc.",Architectural and Interior Design Services,,SLBE,,No,Yes,"784a Tremont Street, Boston, MA 02118",Boston,MA,...,Less than 10,AE - Architects/Engineers,IT - Interior Deisgn Services,,541310 - Architectural Services,541410 - Interior Design Services,,82150000 - Professional artists and performers,81100000 - Professional engineering services,
45,"Bountyful Provisions Company, LLC","Wholesale food distributor, milk, eggs, cheese...",MBE,SLBE,,Yes,Yes,"95 Ruthven Street, 1st Floor, Boston, MA 02121",Boston,MA,...,Less than 10,"FD - Food Products, Services","FD - Food Products, Services",SU - Suppliers,424410 - General Line Grocery Merchant Wholesa...,424470 - Meat and Meat Product Merchant Wholes...,112310 - Chicken Egg Production,73130000 - Food and beverage industries,,
46,Breezie Cleaning and Janitorial Services,Janitorial Cleaning; Window Cleaning; Carpet C...,MWBE,SLBE,,Yes,Yes,"27 Beech Glen Street, Boston, MA 02119",Boston,MA,...,Less than 10,JA - Janitorial Services/Supplies (see also MA...,,,561720 - Janitorial Services,,,76110000 - Cleaning and janitorial services,,


In [4]:
df_district7.info()

<class 'pandas.core.frame.DataFrame'>
Index: 189 entries, 30 to 929
Data columns (total 30 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   company_name               189 non-null    object
 1   services_provided          189 non-null    object
 2   mbe_wbe_cert               177 non-null    object
 3   small_local_cert           124 non-null    object
 4   veteran_cert               1 non-null      object
 5   sdo_certified              189 non-null    object
 6   city_registered            189 non-null    object
 7   address                    189 non-null    object
 8   city                       189 non-null    object
 9   state                      189 non-null    object
 10  zipcode                    189 non-null    object
 11  contact_name               189 non-null    object
 12  contact_title              171 non-null    object
 13  phone                      189 non-null    object
 14  fax           

In [5]:
df_district7['number_employees'].value_counts()

number_employees
Less than 10    157
11 - 20          17
21 - 40           9
41 - 100          1
Name: count, dtype: int64

In [6]:
df_district7_new = df_district7[df_district7['number_employees'] != '20-Nov']
df_clean = df_district7_new['number_employees'].value_counts()
df_clean

number_employees
Less than 10    157
11 - 20          17
21 - 40           9
41 - 100          1
Name: count, dtype: int64

#### **3. Visualizing key metrics and relationships**

##### **3.1 Certification Types by Industry**

This analysis provides a matrix-style visualization of Certification Types by Industry Category for businesses within District 7. Using a pivoted table format with conditional color shading, we can observe the distribution and concentration of certification types across different industry categories.

In [9]:
# Grouping and pivoting the data for a tabular format
industry_certification = df_district7_new.groupby(['cob_category_codes1', 'mbe_wbe_cert']).size().reset_index(name='Count')
industry_certification_table = industry_certification.pivot(index='cob_category_codes1', columns='mbe_wbe_cert', values='Count').fillna(0)

# Normalize counts to get a color scale between 0 and 1
max_count = industry_certification_table.values.max()
color_scaled_values = industry_certification_table / max_count

# Creating the matrix table with conditional formatting
fig_matrix_table = go.Figure(data=[go.Table(
    header=dict(values=['Industry Category'] + list(industry_certification_table.columns),
                align='left',
                fill_color='lightgrey',
                font=dict(size=12, color='black')),
    cells=dict(values=[industry_certification_table.index] + [industry_certification_table[col] for col in industry_certification_table.columns],
               align='left',
               fill=dict(color=[['white'] * len(industry_certification_table)] + 
                         [[f'rgba(0, 0, 255, {val})' for val in color_scaled_values[col]] for col in industry_certification_table.columns]),
               font=dict(size=11, color='black'))
)])

fig_matrix_table.update_layout(
    title="Certification Type by Industry",
    height=600
)
fig_matrix_table.show()

##### Insights:

This table provides a visually intuitive way to:

- Darker cells indicate industries with a higher concentration of a specific certification type, making it easy to pinpoint where diversity certifications are more prominent.

- By examining each row, we can compare how different certification types are represented within each industry category.

- The table highlights trends in certification types across various industries, giving insights into which sectors are more inclusive of minority-owned, women-owned, or veteran-owned businesses.

##### **3.2 Small Business Certification Across Industries**

This matrix table provides a visualization of Small Business Certification Across Various Industry Categories in District 7. By leveraging a pivoted tabular format with conditional color formatting, the table makes it easy to examine the concentration of small business certifications in each industry.

In [11]:
# Group and pivot the data to create a tabular format for small business certification across industries
small_cert_industry = df_district7_new.groupby(['cob_category_codes1', 'small_local_cert']).size().reset_index(name='Count')
small_cert_industry_table = small_cert_industry.pivot(index='cob_category_codes1', columns='small_local_cert', values='Count').fillna(0)

# Normalize counts to apply conditional color formatting for each cell
max_count = small_cert_industry_table.values.max()
color_scaled_values = small_cert_industry_table / max_count

# Creating the matrix table with conditional formatting
fig_small_cert_matrix = go.Figure(data=[go.Table(
    header=dict(values=['Industry Category'] + list(small_cert_industry_table.columns),
                align='left',
                fill_color='lightgrey',
                font=dict(size=12, color='black')),
    cells=dict(values=[small_cert_industry_table.index] + [small_cert_industry_table[col] for col in small_cert_industry_table.columns],
               align='left',
               fill=dict(color=[['white'] * len(small_cert_industry_table)] + 
                         [[f'rgba(0, 100, 200, {val})' for val in color_scaled_values[col]] for col in small_cert_industry_table.columns]),
               font=dict(size=11, color='black'))
)])

# Update layout for presentation
fig_small_cert_matrix.update_layout(
    title="Small Business Certification Across Different Industries",
    height=600
)
fig_small_cert_matrix.show()

##### Insights:

This table offers a clear and concise view of small business certifications across industries, helping identify trends such as:

- Darker cells point to industries with a high concentration of certified small businesses, highlighting sectors where small businesses play a substantial role.

- By comparing rows, we can assess which industries have greater representation of certified small businesses, providing insights into industry-specific growth and support for local businesses.

- This view helps stakeholders understand where small businesses are most prominent, supporting data-driven initiatives to encourage and sustain small business growth within District 7.

##### **3.3 SDO Certification Distribution**

This analysis presents the distribution of Supplier Diversity Office (SDO) Certification among businesses in District 7. The SDO certification aims to support diverse businesses by recognizing those that meet specific diversity and inclusion standards. This visualization allows stakeholders to see the proportion of SDO-certified businesses within the district. The pie chart breaks down the count of businesses that are SDO-certified versus those that are not.

In [12]:
sdo_certified_dist = df_district7_new['sdo_certified'].value_counts().reset_index()
sdo_certified_dist.columns = ['SDO Certified', 'Count']

fig_sdo_certified = px.pie(sdo_certified_dist, names='SDO Certified', values='Count', title='SDO Certification Distribution')
fig_sdo_certified.update_layout(width=600, height=400)
fig_sdo_certified.show()

##### Insights:

- This chart reveals the prevalence of SDO-certified businesses in District 7, providing insight into the overall landscape of supplier diversity in the region.

- Understanding the proportion of SDO-certified businesses can inform support strategies for non-certified businesses that may benefit from assistance in meeting SDO standards.


##### **3.4 City Registration Distribution**

This analysis shows the City Registration Status of businesses in District 7. City registration indicates that a business is formally registered with the local government, which may be required for certain licenses, permits, or other regulatory purposes. This chart helps us understand the proportion of businesses that are officially registered with the city, giving insight into the local business compliance landscape. The pie chart categorizes businesses based on their city registration status, dividing them into registered and non-registered entities.

In [13]:
city_registered_dist = df_district7_new['city_registered'].value_counts().reset_index()
city_registered_dist.columns = ['City Registered', 'Count']

fig_city_registered = px.pie(city_registered_dist, names='City Registered', values='Count', title='City Registration Distribution')
fig_city_registered.update_layout(width=600, height=400)
fig_city_registered.show()

##### Insights:

- This chart helps identify the proportion of businesses that comply with city registration requirements, providing a measure of formal business engagement with local governance.

- Understanding the registration distribution can assist in developing outreach initiatives to support non-registered businesses, potentially guiding them towards formal registration.



##### **3.4 Company Size Distribution by Number of Employees**

This analysis explores the Company Size Distribution of businesses in District 7, using the number of employees as an indicator of company size. This bar chart categorizes businesses based on their employee count, allowing stakeholders to understand the scale of businesses operating within the district. The bar chart segments businesses by their reported number of employees, ranging from small to large entities.

In [14]:
employees_dist = df_district7_new['number_employees'].value_counts().reset_index()
employees_dist.columns = ['Number of Employees', 'Count']

fig_employees = px.bar(employees_dist, x='Number of Employees', y='Count', title='Distribution by Number of Employees',
                       labels={'Count': 'Number of Businesses', 'Number of Employees': 'Company Size'})
fig_employees.update_layout(xaxis={'categoryorder': 'total descending'}, width=700, height=500)
fig_employees.show()

##### Insights:

- This chart reveals whether District 7 is predominantly composed of small businesses or if there are significant numbers of medium and large businesses, offering insights into the district’s economic structure.

- Understanding the distribution of business sizes can help allocate resources and support tailored to different company sizes, ensuring that small, medium, and large businesses all receive appropriate assistance.

- This view of company size also provides insight into the workforce dynamics within District 7, offering an approximation of employment opportunities based on the prevalence of businesses by employee count.


##### **3.5 Certification Breakdown**

This analysis presents the Certification Breakdown of businesses in District 7, focusing on certifications such as MBE (Minority Business Enterprise), WBE (Women Business Enterprise), MWBE (Minority and Women Business Enterprise), and others. This pie chart visually represents the distribution of these certification types, highlighting the diversity within the district’s business community. The pie chart categorizes businesses based on their certification type, showcasing the proportion of each type within District 7.

In [15]:
certification_breakdown = df_district7_new['mbe_wbe_cert'].value_counts().reset_index()
certification_breakdown.columns = ['Certification Type', 'Count']
fig_certification_breakdown = px.pie(certification_breakdown, names='Certification Type', values='Count',
                                     title='Certification Breakdown')
fig_certification_breakdown.update_layout(width=600, height=400)
fig_certification_breakdown.show()


##### Insights:

- This chart reveals the prevalence of different certification types, helping stakeholders assess the diversity of business ownership in District 7. A higher proportion of certain certifications (e.g., MWBE) may indicate a strong presence of diverse and inclusive businesses.

- Tracking certification distribution over time can help evaluate the success of diversity and inclusion initiatives, providing a basis for future policy and support programs.

##### **3.6 Small Local Business Certification Distribution**

This analysis illustrates the Distribution of Small Local Business Certifications among businesses in District 7. Small local business certifications indicate that a business meets certain criteria as a small, locally owned enterprise, often eligible for local support programs. This pie chart shows the proportion of businesses that have obtained this certification, offering insights into the presence of small, locally focused businesses within the district.

In [16]:
small_local_cert_dist = df_district7_new['small_local_cert'].value_counts().reset_index()
small_local_cert_dist.columns = ['Small Local Certification', 'Count']
fig_small_local_cert = px.pie(small_local_cert_dist, names='Small Local Certification', values='Count',
                              title='Small Local Business Certification Distribution')
fig_small_local_cert.update_layout(width=600, height=400)
fig_small_local_cert.show()


##### Insights:

- This chart reveals the extent to which small local businesses are present in District 7, highlighting the role of locally owned enterprises in the district’s economy.

- Knowing the proportion of small local businesses can help guide support initiatives, encouraging more small businesses to gain certification or supporting certified businesses in their growth.


##### 3.7 Employee Size vs. Certification Type

This analysis examines the relationship between **Employee Size and Certification Type** for businesses in District 7. By categorizing businesses based on the number of employees and their certification type (e.g., MBE, WBE, MWBE), this bar chart provides insights into the diversity landscape across different company sizes.

In [17]:
employee_vs_certification = df_district7_new.groupby(['number_employees', 'mbe_wbe_cert']).size().reset_index(name='Count')
fig_employee_vs_certification = px.bar(employee_vs_certification, x='number_employees', y='Count', color='mbe_wbe_cert',
                                       title='Employee Size vs Certification Type',
                                       labels={'number_employees': 'Number of Employees', 'Count': 'Number of Businesses', 'mbe_wbe_cert': 'Certifcation Type'})
fig_employee_vs_certification.update_layout(xaxis={'categoryorder': 'total descending'}, width=700, height=500)
fig_employee_vs_certification.show()


##### Insights:

- This chart highlights the presence of different certification types within small, medium, and large businesses. Patterns may reveal if minority-owned or women-owned certifications are more prevalent in certain business sizes.

- Understanding the breakdown of certification types by company size can help tailor support programs. For example, if most certified businesses are small, resources could be directed toward helping these companies grow.

##### **3.8 Employee Size vs. Small Business Certification**

This analysis explores the relationship between Employee Size and Small Business Certification status for businesses in District 7. By categorizing businesses based on the number of employees and their small business certification status, this bar chart offers insights into the scale and reach of certified small businesses within the district.

In [18]:
# Group by 'number_employees' and 'small_local_cert' to analyze small business certifications
employee_vs_small_certification = df_district7_new.groupby(['number_employees', 'small_local_cert']).size().reset_index(name='Count')

fig_employee_vs_small_certification = px.bar(
    employee_vs_small_certification,
    x='number_employees',
    y='Count',
    color='small_local_cert',
    title='Employee Size vs Small Business Certification',
    labels={'number_employees': 'Number of Employees', 'Count': 'Number of Businesses', 'small_local_cert': 'Small Business Certification'}
)
fig_employee_vs_small_certification.update_layout(xaxis={'categoryorder': 'total descending'}, width=700, height=500)
fig_employee_vs_small_certification.show()


##### Insights:

- This chart highlights the concentration of small business certifications across different employee sizes, showing if most certified small businesses are smaller (e.g., less than 10 employees) or if some certifications extend to larger companies.

- Understanding which sizes of businesses are more likely to be certified as small local businesses can help in designing support programs tailored to their needs. For instance, if most small certified businesses are very small, they might benefit from growth assistance.

##### **3.9 Employee Size Trends Over Time**

This analysis examines Employee Size Trends Over Time for businesses in District 7. By tracking changes in the number of businesses within each employee size category across different years, this line chart provides insight into the district’s business growth dynamics and the evolution of company sizes over time.

In [23]:
df_district7_new['year_established'] = pd.to_datetime(df_district7_new['date_business_established'], errors='coerce').dt.year
employee_size_trends = df_district7_new.groupby(['year_established', 'number_employees']).size().reset_index(name='Count')
fig_employee_size_trends = px.line(
    employee_size_trends, x='year_established', y='Count', color='number_employees',
    title='Employee Size Trends Over Time',
    labels={'year_established': 'Year Established', 'Count': 'Number of Businesses', 'number_employees': 'Employee Size'}
)
fig_employee_size_trends.show()

##### Insights:

-  This chart reveals trends in business sizes over time, showing if smaller or larger businesses have become more prevalent in recent years. It helps to identify periods of growth for specific company sizes, such as increased establishment of small businesses in certain years.

- The chart can reveal the impact of economic events on business formation and size over time. For instance, noticeable dips or rises in certain years may correlate with broader economic trends or local development initiatives.

##### **3.10 Longevity of Certified Businesses**

This analysis explores the Longevity of Certified Businesses in District 7, measured by the age of each business since its establishment. By categorizing businesses based on their certification type and age in years, this bar chart offers insights into the stability and longevity of certified businesses within the district.

In [24]:
df_district7_new['business_age'] = 2024 - df_district7_new['year_established']  # Replace 2024 with the current year
longevity_certification = df_district7_new.groupby(['business_age', 'mbe_wbe_cert']).size().reset_index(name='Count')
fig_longevity_certification = px.bar(
    longevity_certification, x='business_age', y='Count', color='mbe_wbe_cert',
    title='Longevity of Certified Businesses',
    labels={'business_age': 'Age of Business (years)', 'Count': 'Number of Businesses', 'mbe_wbe_cert': 'Certification Type'}
)
fig_longevity_certification.update_layout(xaxis={'categoryorder': 'total descending'})
fig_longevity_certification.show()

##### Insights:

- This chart reveals patterns in business age across different certification types, helping to identify which certifications are associated with longer-standing businesses. For example, a higher concentration of older businesses with a particular certification may indicate stability in that category.

- By examining the distribution of business ages, we gain insight into the resilience of certified businesses within the district, as older businesses may indicate stability and adaptation over time.

##### **3.11 Certification Trends Over Time**

This analysis examines Certification Trends Over Time for businesses in District 7. By tracking the number of certified businesses established each year, categorized by certification type, this line chart provides insights into the growth and evolution of diversity certifications within the district.

In [25]:
certification_trends = df_district7_new.groupby(['year_established', 'mbe_wbe_cert']).size().reset_index(name='Count')
fig_certification_trends = px.line(
    certification_trends, x='year_established', y='Count', color='mbe_wbe_cert',
    title='Certification Trends Over Time',
    labels={'year_established': 'Year Established', 'Count': 'Number of Businesses', 'mbe_wbe_cert': 'Certification Type'}
)
fig_certification_trends.show()


##### Insights:

- This chart reveals how the number of certified businesses has evolved, highlighting any increases or declines in certain certifications over time. For example, a rise in MWBE certifications in recent years may indicate growing support for both minority- and women-owned businesses.

- Understanding certification trends over time can help assess the impact of diversity and inclusion initiatives. Increased certifications in certain periods may correlate with local or national policies supporting diverse businesses.


#### **Conclusion:**

This analysis provides an in-depth look into the registered/certified business landscape of District 7, with a focus on diversity, small business representation, and business growth over time. By examining a variety of factors—such as certification types, business size, longevity, and registration status—this analysis offers valuable insights into the economic makeup of District 7 and highlights areas for potential development and support.

This analysis underscores the importance of diversity, small business support, and targeted resources for local economic development. Stakeholders can use these insights to:
- Develop **support programs** tailored to the specific needs of certified small businesses and those in high-growth areas.

- **Promote diversity and inclusion** by supporting businesses that contribute to a more inclusive economic landscape.

- **Encourage formal registration** among businesses, helping them access resources and comply with city regulations.

- Monitor **long-term stability** and growth patterns, ensuring that both newer and established businesses receive appropriate support.