How to get New York Time’s attention?

Problem: According to the U.S. Census, New York has a total population of  20,201,249. The New York Times has 5.26 million paid subscribers in the first quarter of 2021 according to statista.com. Only a small percent of New York’s total population is subscribed to the New York Times. How can the New York Times increase the number of New York subscribers? 

Impact Hypothesis: By understanding the demographic, social, housing, and economic characteristics of New Yorkers, the New York Times can implement personalized outreach subscriber campaigns to segmented groups in order to increase New York readership.

Dataset: American Community Survey 5-Year Data (2009-2019)

Preliminary Analysis: 

- What are the education groups?
- What are their ages?
- What are their income levels?
- Employed or unemployed?
- What industry?
- Type of commute to work?
- Veteren status?
- What are their ethnicities?
- Are they foreign born or non-foreign?


Solution Path 1:
Classification to find which zip codes has more potential subscribers. logistic regression target label what is informing that
The New York Times should see an increase in subscribers as they focus on New Yorkers who are likely to subscribe
The model is predicting accurately on a test set

Solution Path 2: 
Unsupervised clustering to understand the types of New Yorkers living in different zipcodes and how to better cater to target groups. 
The New York Times should see an increase in subscribers as they focus on relevant messaging to potential subscribers
The clustering is accurately grouping New Yorkers and identifying target groups for New York Times to promote relevant messages


In [10]:
import requests
import censusdata
import pandas as pd

In [11]:
def acquire_nys_zipcode(src, year):
    nys_zipcodes = [] 
    zipcode = censusdata.geographies(censusdata.censusgeo([('state', '36'), ('zip code tabulation area', '*')]), src, year)
    for item in zipcode.items():
        nys_zipcodes.append(str(item[0])[-5:])
    return nys_zipcodes

In [12]:
def dl_nyscensus_data(attribute_list, colname_list, filename):
    
    zipcode = acquire_nys_zipcode('acs5', 2019)
    
    df = censusdata.download('acs5', 2019,
           censusdata.censusgeo([('state', '36'),
                                 ('zip code tabulation area', '*')]), attribute_list)
    
    df = df.rename(index=dict(zip(df.index, zipcode)))
    df = df.rename(columns=dict(zip(df.columns,colname_list)))
    
    path = r'C:/home/desbrium/Metis/NYT Business/Data/'
    
    return df.to_csv(f'Data/{filename}.csv', index = True)

In [36]:
emp_list = ['B23025_001E', 'B23025_002E', 'B23025_003E',
           'B23025_004E', 'B23025_005E',
           'B23025_006E', 'B23025_007E']

emp_col = ['Population 16 and Over', 'In Labor Force Total', 'Civilian Labor Force', 'Civilian Employed', 
           'Civilian Unemployed', 'Armed Forces', 'Not in Labor Force']

emp_filename = 'Over16EMP'

In [37]:
#dl_nyscensus_data(emp_list, emp_col, emp_filename)

In [39]:
fm_age_list = [f'B01001_0{num}E' for num in range(26,50)]

fm_age_col = ['F Total', 'F Under 5 yrs', 'F 5 to 9 yrs', 'F 10 to 14 yrs', 'F 15 to 17 yrs', 'F 18 to 19 yrs', 'F 20', 'F 21',
              'F 22 to 24 yrs', 'F 25 to 29 yrs', 'F 30 to 34 yrs', 'F 35 to 39 yrs', 'F 40 to 44 yrs', 'F 45 to 49 yrs',
              'F 50 to 54 yrs', 'F 55 to 59 yrs', 'F 60 to 61 yrs', 'F 62 to 64 yrs', 'F 65 to 66 yrs', 'F 67 to 69 yrs', 'F 70 to 74 yrs', 
              'F 75 to 79 yrs', 'F 80 to 84 yrs', 'F 85 yrs and over']

fm_age_filename = 'FemaleByAge'

In [40]:
#dl_nyscensus_data(fm_age_list, fm_age_col, fm_age_filename)

In [42]:
m_age_list = [f'B01001_0{num}E' if num > 9 else f'B01001_00{num}E' for num in range(2,26)] 

m_age_col = [string.replace('F','M') for string in fm_age_col]

m_age_filename = 'MaleByAge'

In [43]:
#dl_nyscensus_data(m_age_list, m_age_col, m_age_filename)

In [45]:
race_list = ['B01003_001E', 'B02001_002E', 'B02001_003E', 'B02001_004E', 
            'B02001_005E', 'B02001_006E', 'B03001_003E']

race_col = ['Total Pop', 'White', 'African American', 'Native American', 'Asian', 'Pacific Islander', 'Hispanic or Latino']

race_filename = 'Race'

In [46]:
#dl_nyscensus_data(race_list, race_col, race_filename)

In [60]:
edu_list = [f'B15003_0{num}E' if num > 9 else f'B15003_00{num}E' for num in range(1,26)] 

edu_col = ['Population 25 and Over', 'No Schooling', 'Nursery', 'Kindergarten'] + [f'{num}th grade' if num != 12 else f'{num}th grade, no diploma'for num in range(1,13)] + ['Reg HS Diploma', 'GED', '1 yr College', '1 yr or more College, no degree', "Associate's Degree", 
           "Bachelor's Degree", "Master's Degree", "Professional school Degree", "Doctorate Degree"]

edu_filename = 'Over25Edu'

In [62]:
#dl_nyscensus_data(edu_list, edu_col, edu_filename)

In [71]:
def age_col(agegrp, attribute, gender):
    
    col_names = []
    
    for age in agegrp:
            
        col_names.extend([gender + ' ' + age + ' ' + att for att in attribute])
    
    return col_names

In [72]:
Age_Groups = ['18 to 24', '25 to 34', '35 to 44', '45 to 64', '65 years and over']

In [73]:
Degrees = ['Total', 'Less than 9th grade', '9th to 12th grade, no diploma', 'High school graduate', 
           'Some college, no degree', "Associate's degree", "Bachelor's degree", "Graduate or professional degree"]

In [76]:
m_edu_list = [f'B15001_0{num}E' if num > 9 else f'B15001_00{num}E' for num in range(3,43)] 

m_edu_col = age_col(Age_Groups, Degrees, 'M')

m_edu_filename = 'MaleOver18Edu'

In [78]:
#dl_nyscensus_data(m_edu_list, m_edu_col, m_edu_filename)

In [77]:
fm_edu_list = [f'B15001_0{num}E' for num in range(44,84)] 

fm_edu_col = age_col(Age_Groups, Degrees, 'F')

fm_edu_filename = 'FMaleOver18Edu'

In [79]:
#dl_nyscensus_data(fm_edu_list, fm_edu_col, fm_edu_filename)

In [84]:
income_list = [f'B19001_0{num}E' if num > 9 else f'B19001_00{num}E' for num in range(1,18)] 

income_col = ['Total', 'Less than $10,000', '$10,000 to $14,999', '$15,000 to $19,999', '$20,000 to $24,999',
              '$25,000 to $29,999', '$30,000 to $34,999', '$35,000 to $39,999', '$40,000 to $44,999',
              '$45,000 to $49,999', '$50,000 to $59,999', '$60,000 to $74,999', '$75,000 to $99,999',
              '$100,000 to $124,999', '$125,000 to $149,999', '$150,000 to $199,999', '$200,000 or more']

income_filename = 'Income'

In [85]:
#dl_nyscensus_data(income_list, income_col, income_filename)

In [86]:
Income_Age_Groups = ['Under 25', '25 to 44', '45 to 64', '65 years and over']

In [96]:
age_income_list = [f'B19037_0{num}E' if num > 9 else f'B19037_00{num}E' for num in range(2,70)] 

age_income_col = [name[2:] for name in age_col(Income_Age_Groups, income_col, 'A')]

age_income_filename = 'IncomeByAge'

In [97]:
#dl_nyscensus_data(age_income_list, age_income_col, age_income_filename)

In [13]:
industry_list = [f'B08126_0{num}E' if num > 9 else f'B08126_00{num}E' for num in range(2,16)] 

industry_list.remove('B08126_013E')

industry_col = ['Agriculture_forestry_fishing_hunting_mining', 'Construction', 'Manufacturing', 'Wholesale', 
                'Retail', 'Transportation_Warehousing_Utilities', 'Information', 'Financial Services',
                'Admin_Science_Professional Roles', 'Education_Health_Social Services', 'Entertainment_Arts_Food Services', 
                'Public Admin', 'Armed Forces']

industry_filename = 'Industry'

In [14]:
#dl_nyscensus_data(industry_list, industry_col, industry_filename)