## Part C: Usage of  SQL to join tables and filter data

For answering the the research question (Is there any relation between COVID infection rate in different New York State  counties and their distance from New York City?), We need to join the following three tables:

1. `nycovidcase`
2. `nydistance`
3. `nypopulation


We'll filter out the following information in a combined (joined) view table

From table 1 : The number of total case on 2020-11-13 (The cumulative total cases will be highest on this date, since it is the last date mentioned in the csv)

From table 2 : The distance of each county from Manhattan (fips code = '36061'), NYC

From table 3 : The population of each county (to determine the rate of infection among the population in each county)

### 1. Connecting a database 

In [1]:
import pymysql
conn = pymysql.connect(host='YourDatabaseUrlOrLocalHost', \
                       port=3306, \
                       user='UserName', \
                       passwd='*******', \
                       db='database_name', autocommit=True)
cur = conn.cursor(pymysql.cursors.DictCursor)

### 2. Making a function for executing sql commands

In [2]:
def execute_sql(sql):
    '''Fetches(from a connected database) the result of the given sql command
        which should be written in betwwen tripple quotes.
       Establishing the database connection and ccursor as 'cur' is a prerequisite'''
    cur.execute(sql)
    result = cur.fetchall()
    return result

### 3. Checking distance data

In [3]:
execute_sql('''SELECT * FROM nydistance LIMIT 5''')

[{'distanceid': 1, 'fips1': '36013', 'distance': 359.0, 'fips2': '36103'},
 {'distanceid': 2, 'fips1': '36103', 'distance': 359.0, 'fips2': '36013'},
 {'distanceid': 3, 'fips1': '36063', 'distance': 357.0, 'fips2': '36103'},
 {'distanceid': 4, 'fips1': '36103', 'distance': 357.0, 'fips2': '36063'},
 {'distanceid': 5, 'fips1': '36029', 'distance': 337.0, 'fips2': '36103'}]

### 4. Checking population data

In [4]:
execute_sql('''SELECT * FROM `nypopulation`limit 5''')

[{'populationid': 1, 'county': 'Kings', 'population': 2559903},
 {'populationid': 2, 'county': 'Queens', 'population': 2253858},
 {'populationid': 3, 'county': 'New York', 'population': 1628706},
 {'populationid': 4, 'county': 'Suffolk', 'population': 1476601},
 {'populationid': 5, 'county': 'Bronx', 'population': 1418207}]

### 5. Checking COVID case data

In [5]:
execute_sql('''SELECT * FROM `nycovidcase`limit 5''')

[{'caseid': 1,
  'date': datetime.date(2020, 3, 1),
  'county': 'New York City',
  'state': 'New York',
  'fips': '',
  'cases': 1,
  'death': 0},
 {'caseid': 2,
  'date': datetime.date(2020, 3, 2),
  'county': 'New York City',
  'state': 'New York',
  'fips': '',
  'cases': 1,
  'death': 0},
 {'caseid': 3,
  'date': datetime.date(2020, 3, 3),
  'county': 'New York City',
  'state': 'New York',
  'fips': '',
  'cases': 2,
  'death': 0},
 {'caseid': 4,
  'date': datetime.date(2020, 3, 4),
  'county': 'New York City',
  'state': 'New York',
  'fips': '',
  'cases': 2,
  'death': 0},
 {'caseid': 5,
  'date': datetime.date(2020, 3, 4),
  'county': 'Westchester',
  'state': 'New York',
  'fips': '36119',
  'cases': 9,
  'death': 0}]

### Joining tables and checking fields required for answering the research question

In [6]:
execute_sql('''SELECT c.county, d.distance as Distance_From_Manhattan, c.cases as Total_Case_On_Nov13_2020, p.population, c.cases/p.population*100 as Infection_Rate

FROM nycovidcase c, nydistance d, nypopulation p

WHERE d.fips1 = '36061'

AND d.fips2 = c.fips

AND p.county = c.county

AND c.date = '2020-11-13'
LIMIT 5
''')

[{'county': 'Allegany',
  'Distance_From_Manhattan': 233.0,
  'Total_Case_On_Nov13_2020': 589,
  'population': 46091,
  'Infection_Rate': Decimal('1.2779')},
 {'county': 'Cattaraugus',
  'Distance_From_Manhattan': 264.0,
  'Total_Case_On_Nov13_2020': 661,
  'population': 76117,
  'Infection_Rate': Decimal('0.8684')},
 {'county': 'Cayuga',
  'Distance_From_Manhattan': 204.0,
  'Total_Case_On_Nov13_2020': 609,
  'population': 76576,
  'Infection_Rate': Decimal('0.7953')},
 {'county': 'Chautauqua',
  'Distance_From_Manhattan': 300.0,
  'Total_Case_On_Nov13_2020': 1210,
  'population': 126903,
  'Infection_Rate': Decimal('0.9535')},
 {'county': 'Chemung',
  'Distance_From_Manhattan': 172.0,
  'Total_Case_On_Nov13_2020': 2280,
  'population': 83456,
  'Infection_Rate': Decimal('2.7320')}]

### Making a view table with the fields joined from 3 tables

In [7]:
execute_sql(''' CREATE VIEW Infection_Rate_Vs_Distance_2020_11_13 AS

SELECT c.county, d.distance as Distance_From_Manhattan, c.cases as Total_Case_On_Nov13_2020, p.population, c.cases/p.population*100 as Infection_Rate

FROM nycovidcase c, nydistance d, nypopulation p

WHERE d.fips1 = '36061'

AND d.fips2 = c.fips

AND p.county = c.county

AND c.date = '2020-11-13';

''')

()

### Exporting the content of the view table as a csv
The content of the view table will now be wxported to "Infection_Rate_Vs_Distance_2020_11_13.csv" for further operations.