# Description of the Problem and Discussion of the Background (Introduction Section)

### A team is looking at opening a restaurant in the Dallas-Fort Worth Metroplex, TX 

Per the [Dallas Cultural Map]('https://dallas.culturemap.com/news/city-life/01-09-20-dfw-lead-population-growth-2020-2029-cushman-wakefield/'), following a decade of eye-popping population growth, Dallas-Fort-Worth is expected in this decade to once again lead the nation’s metro areas for the number of new residents.   
New data from commercial real estate services company Cushman & Wakefield shows DFW gained 1,349,378 residents from 2010 through 2019. In terms of the number of new residents tallied during the past decade, DFW ranked first among U.S. metro areas, the data indicates.   
From 2020 through 2029, DFW is projected to tack on another 1,393,623 residents, Cushman & Wakefield says. For the second decade in a row, that would be the highest number of new residents for any metro area, the company says.   
Also per [bizournals]('https://www.bizjournals.com/dallas/news/2019/11/21/study-660-companies-moving-facilities-out-of.html'), some 660 companies moved 765 facilities out of California in the past two years, and Dallas-Fort Worth has been the beneficiary of many of the relocations, according to a new report. The departures from the Golden State between January 2018 and now involve corporate headquarters, manufacturing facilities, data centers, research hubs, software and engineering centers and a few warehouses.

With all this information at hand a team is looking for a good location in the Dallas-Fort-Worth Metroplex to setup their restaurant where they can make the most profit

# A description of the data and how it will be used to solve the problem. 

The data is made up of all the zip codes, zip code names and population in the Collin and Dnton counties which are both located in north Dallas where most of the migration has taken place and also where majurity of the companies that moved to the Dallas-Firt-Worth Metroplex are located.   
The zip codes will be passed into the geopy library to get their latitude and longitude which will be used in the analyses.

**Obtain the Data and analysing the neighborhoods**

  * Pandas will be used to scrap the data of the Dallas [Cultural Map website]("https://www.zipdatamaps.com/list-of-zip-codes-in-texas.php")   
  * Two counties (Collin and Denton) will be selected for the analyses
  * Use Foresquare Data to obtain info about restaurants
  * Data Visualization and Statistical Analysis
  * Analysis Using Clustering, Specially K-Means Clustering
    - Maximize the number of clusters.
    - Visualization using Chloropleth Map
  * Compare the Neighborhoods to Find the Best Place for Starting up a Restaurant

In [20]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from geopy.geocoders import Nominatim

In [21]:
geolocator = Nominatim(user_agent="dfw_explorer")
location = geolocator.geocode({"postalcode": 75068})
latitude = location.latitude
longitude = location.longitude

In [22]:
latitude

33.169930105156986

In [17]:
df = pd.read_csv('State_of_Texas.csv')

In [19]:
df.tail()

Unnamed: 0,FIPS,area_name,year,age_group,total,total_male,total_female,nh_white_total,nh_white_male,nh_white_female,nh_black_total,nh_black_male,nh_black_female,hispanic_total,hispanic_male,hispanic_female,nh_asian_total,nh_asian_male,nh_asian_female,nh_other_total,nh_other_male,nh_other_female
241,0,State of Texas,2050,<18,10747450,5473969,5273481,2392698,1222060,1170638,1285703,653523,632180,5297102,2694022,2603080,1176045,600550,575495,595902,303814,292088
242,0,State of Texas,2050,18-24,4305000,2201386,2103614,1030850,531813,499037,541929,275910,266019,2007000,1021388,985612,517878,266305,251573,207343,105970,101373
243,0,State of Texas,2050,25-44,12789607,6496859,6292748,3347976,1729822,1618154,1696253,862713,833540,5372408,2710541,2661867,1840838,920058,920780,532132,273725,258407
244,0,State of Texas,2050,45-64,11193374,5653597,5539777,3493575,1795766,1697809,1548204,776003,772201,4424459,2213994,2210465,1397956,700455,697501,329180,167379,161801
245,0,State of Texas,2050,65+,8306674,3789269,4517405,3258740,1496591,1762149,958706,396599,562107,3090781,1425377,1665404,850162,404441,445721,148285,66261,82024


In [12]:
collin_county_df = pd.read_html("https://www.zipdatamaps.com/collin-tx-county-zipcodes")

In [13]:
collin_county_df = collin_county_df[1]

In [14]:
denton_county_df = pd.read_html("https://www.zipdatamaps.com/denton-tx-county-zipcodes")

In [15]:
denton_county_df = denton_county_df[1]

In [16]:
denton_county_df.head()

Unnamed: 0_level_0,"List of Zipcodes in Denton County, Texas","List of Zipcodes in Denton County, Texas","List of Zipcodes in Denton County, Texas","List of Zipcodes in Denton County, Texas"
Unnamed: 0_level_1,ZIP Code,ZIP Code Name,Population,Type
0,75007.0,Carrollton,51624.0,Non-Unique
1,75009.0,Celina,8785.0,Non-Unique
2,75010.0,Carrollton,21607.0,Non-Unique
3,75019.0,Coppell,38666.0,Non-Unique
4,75022.0,Flower Mound,22545.0,Non-Unique


In [27]:
denton_county_df.columns

MultiIndex([('List of Zipcodes in Denton County, Texas',      'ZIP Code'),
            ('List of Zipcodes in Denton County, Texas', 'ZIP Code Name'),
            ('List of Zipcodes in Denton County, Texas',    'Population'),
            ('List of Zipcodes in Denton County, Texas',          'Type')],
           )

In [28]:
denton_county_df.columns = ['ZIP Code', 'ZIP Code Name', 'Population', 'Type']

In [29]:
denton_county_df.columns

Index(['ZIP Code', 'ZIP Code Name', 'Population', 'Type'], dtype='object')

In [30]:
denton_county_df.head()

Unnamed: 0,ZIP Code,ZIP Code Name,Population,Type
0,75007.0,Carrollton,51624.0,Non-Unique
1,75009.0,Celina,8785.0,Non-Unique
2,75010.0,Carrollton,21607.0,Non-Unique
3,75019.0,Coppell,38666.0,Non-Unique
4,75022.0,Flower Mound,22545.0,Non-Unique


In [34]:
denton_county_df.dropna(inplace=True)

In [37]:
denton_county_df['ZIP Code'] = denton_county_df['ZIP Code'].astype('int32')

In [38]:
denton_county_df.head()

Unnamed: 0,ZIP Code,ZIP Code Name,Population,Type
0,75007,Carrollton,51624.0,Non-Unique
1,75009,Celina,8785.0,Non-Unique
2,75010,Carrollton,21607.0,Non-Unique
3,75019,Coppell,38666.0,Non-Unique
4,75022,Flower Mound,22545.0,Non-Unique
