<font size="5"> Data Manipulation using DASK and Causal Inference using Diff-in-Diff approach

In [1]:
import requests, json, time, statistics, aiohttp, pandas as pd, numpy as np, altair as alt

from datetime import datetime, date, timedelta
from dateutil.relativedelta import relativedelta

# DASK
import dask.array as da
import dask.dataframe as dd
import dask.bag as bag
from dask.distributed import LocalCluster, Client

pd.set_option('display.max_columns', 500)


<font size="5"> Find Causal effect of Coronavirus on home prices in Los Angeles County

<font size="3"> Intuition:
    
> We hypothesize home prices to fall due to Covid. We think Covid may have led to weakened demand for houses, 
due to some people losing jobs hence moving to rental units, some people falling sick hence pausing their plans of buying new house (or upgrading to a bigger house), changes in attitudes towards perceived risk of real estate investing among other factors.  

<font size="3"> What is diff-in-diff (DiD) method?
    
> DiD method is a common statistical technique used to draw causal inferences from observational data. It calculates the causal effect of a treatment (Examples: sudden change like a new tax introduction or a Govt policy change or Covid hitting world, etc) on an outcome variable (e.g. median home price of single family homes) by comparing the average change over time in the outcome variable for the treatment group, compared to the average change over time for the control group.

> With the unexpected advent of Covid-19 in our county, it is reasonable to ask how it affected home prices. This is difficult to find out because in order to truly know how those home prices in our county were causally impacted, we need to consider how those home prices would be had the county never experienced Covid-19 (the counterfactual). However, the Covid-19 did hit the county and we never get to observe how those home prices would have fared without Covid-19. 
    
> DiD uses the outcome of the control group as a proxy / counterfactual for what would have occurred in the treatment group had there been no treatment. The difference in the average post-treatment outcomes between the treatment and control groups is a measure of the causal effect.

Datasets:
> The datasets we use to explore this question can be found here https://api.developer.attomdata.com/docs and https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/state/california
    

In [2]:
s = pd.read_csv('/Users/adityahpatel/Desktop/PYTHON PROGRAMS/Milestone Project/covid_confirmed_usafacts.csv')
cali = s[s.State == 'CA']   # Pulling out all counties of California, same can be done for NY state
cali.sample(4)

Unnamed: 0,countyFIPS,County Name,State,StateFIPS,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-26,2020-01-27,2020-01-28,2020-01-29,2020-01-30,2020-01-31,2020-02-01,2020-02-02,2020-02-03,2020-02-04,2020-02-05,2020-02-06,2020-02-07,2020-02-08,2020-02-09,2020-02-10,2020-02-11,2020-02-12,2020-02-13,2020-02-14,2020-02-15,2020-02-16,2020-02-17,2020-02-18,2020-02-19,2020-02-20,2020-02-21,2020-02-22,2020-02-23,2020-02-24,2020-02-25,2020-02-26,2020-02-27,2020-02-28,2020-02-29,2020-03-01,2020-03-02,2020-03-03,2020-03-04,2020-03-05,2020-03-06,2020-03-07,2020-03-08,2020-03-09,2020-03-10,2020-03-11,2020-03-12,2020-03-13,2020-03-14,2020-03-15,2020-03-16,2020-03-17,2020-03-18,2020-03-19,2020-03-20,2020-03-21,2020-03-22,2020-03-23,2020-03-24,2020-03-25,2020-03-26,2020-03-27,2020-03-28,2020-03-29,2020-03-30,2020-03-31,2020-04-01,2020-04-02,2020-04-03,2020-04-04,2020-04-05,2020-04-06,2020-04-07,2020-04-08,2020-04-09,2020-04-10,2020-04-11,2020-04-12,2020-04-13,2020-04-14,2020-04-15,2020-04-16,2020-04-17,2020-04-18,2020-04-19,2020-04-20,2020-04-21,2020-04-22,2020-04-23,2020-04-24,2020-04-25,2020-04-26,2020-04-27,2020-04-28,2020-04-29,2020-04-30,2020-05-01,2020-05-02,2020-05-03,2020-05-04,2020-05-05,2020-05-06,2020-05-07,2020-05-08,2020-05-09,2020-05-10,2020-05-11,2020-05-12,2020-05-13,2020-05-14,2020-05-15,2020-05-16,2020-05-17,2020-05-18,2020-05-19,2020-05-20,2020-05-21,2020-05-22,2020-05-23,2020-05-24,2020-05-25,2020-05-26,2020-05-27,2020-05-28,2020-05-29,2020-05-30,2020-05-31,2020-06-01,2020-06-02,2020-06-03,2020-06-04,2020-06-05,2020-06-06,2020-06-07,2020-06-08,2020-06-09,2020-06-10,2020-06-11,2020-06-12,2020-06-13,2020-06-14,2020-06-15,2020-06-16,2020-06-17,2020-06-18,2020-06-19,2020-06-20,2020-06-21,2020-06-22,2020-06-23,2020-06-24,2020-06-25,2020-06-26,2020-06-27,2020-06-28,2020-06-29,2020-06-30,2020-07-01,2020-07-02,2020-07-03,2020-07-04,2020-07-05,2020-07-06,2020-07-07,2020-07-08,2020-07-09,2020-07-10,2020-07-11,2020-07-12,2020-07-13,2020-07-14,2020-07-15,2020-07-16,2020-07-17,2020-07-18,2020-07-19,2020-07-20,2020-07-21,2020-07-22,2020-07-23,2020-07-24,2020-07-25,2020-07-26,2020-07-27,2020-07-28,2020-07-29,2020-07-30,2020-07-31,2020-08-01,2020-08-02,2020-08-03,2020-08-04,2020-08-05,2020-08-06,2020-08-07,2020-08-08,2020-08-09,2020-08-10,2020-08-11,2020-08-12,2020-08-13,2020-08-14,2020-08-15,2020-08-16,2020-08-17,2020-08-18,2020-08-19,2020-08-20,2020-08-21,2020-08-22,2020-08-23,2020-08-24,2020-08-25,2020-08-26,2020-08-27,2020-08-28,2020-08-29,2020-08-30,2020-08-31,2020-09-01,2020-09-02,2020-09-03,2020-09-04,2020-09-05,2020-09-06,2020-09-07,2020-09-08,2020-09-09,2020-09-10,2020-09-11,2020-09-12,2020-09-13,2020-09-14,2020-09-15,2020-09-16,2020-09-17,2020-09-18,2020-09-19,2020-09-20,2020-09-21,2020-09-22,2020-09-23,2020-09-24,2020-09-25,2020-09-26,2020-09-27,2020-09-28,2020-09-29,2020-09-30,2020-10-01,2020-10-02,2020-10-03,2020-10-04,2020-10-05,2020-10-06,2020-10-07,2020-10-08,2020-10-09,2020-10-10,2020-10-11,2020-10-12,2020-10-13,2020-10-14,2020-10-15,2020-10-16,2020-10-17,2020-10-18,2020-10-19,2020-10-20,2020-10-21,2020-10-22,2020-10-23,2020-10-24,2020-10-25,2020-10-26,2020-10-27,2020-10-28,2020-10-29,2020-10-30,2020-10-31,2020-11-01,2020-11-02,2020-11-03,2020-11-04,2020-11-05,2020-11-06,2020-11-07,2020-11-08,2020-11-09,2020-11-10,2020-11-11,2020-11-12,2020-11-13,2020-11-14,2020-11-15,2020-11-16,2020-11-17,2020-11-18,2020-11-19,2020-11-20,2020-11-21,2020-11-22,2020-11-23,2020-11-24,2020-11-25,2020-11-26,2020-11-27,2020-11-28,2020-11-29,2020-11-30,2020-12-01,2020-12-02,2020-12-03,2020-12-04,2020-12-05,2020-12-06,2020-12-07,2020-12-08,2020-12-09,2020-12-10,2020-12-11,2020-12-12,2020-12-13,2020-12-14,2020-12-15,2020-12-16,2020-12-17,2020-12-18,2020-12-19,2020-12-20,2020-12-21,2020-12-22,2020-12-23,2020-12-24,2020-12-25,2020-12-26,2020-12-27,2020-12-28,2020-12-29,2020-12-30,2020-12-31,2021-01-01,2021-01-02,2021-01-03,2021-01-04,2021-01-05,2021-01-06,2021-01-07,2021-01-08,2021-01-09,2021-01-10,2021-01-11,2021-01-12,2021-01-13,2021-01-14,2021-01-15,2021-01-16,2021-01-17,2021-01-18,2021-01-19,2021-01-20,2021-01-21,2021-01-22,2021-01-23,2021-01-24,2021-01-25,2021-01-26,2021-01-27,2021-01-28,2021-01-29,2021-01-30,2021-01-31,2021-02-01,2021-02-02,2021-02-03,2021-02-04,2021-02-05,2021-02-06,2021-02-07,2021-02-08,2021-02-09,2021-02-10,2021-02-11,2021-02-12,2021-02-13,2021-02-14,2021-02-15,2021-02-16,2021-02-17,2021-02-18,2021-02-19,2021-02-20,2021-02-21,2021-02-22,2021-02-23,2021-02-24,2021-02-25,2021-02-26,2021-02-27,2021-02-28,2021-03-01,2021-03-02,2021-03-03,2021-03-04,2021-03-05,2021-03-06,2021-03-07,2021-03-08,2021-03-09,2021-03-10,2021-03-11,2021-03-12,2021-03-13,2021-03-14,2021-03-15,2021-03-16,2021-03-17,2021-03-18,2021-03-19,2021-03-20,2021-03-21,2021-03-22,2021-03-23,2021-03-24,2021-03-25,2021-03-26,2021-03-27,2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01,2021-04-02,2021-04-03,2021-04-04,2021-04-05,2021-04-06,2021-04-07,2021-04-08,2021-04-09,2021-04-10,2021-04-11,2021-04-12,2021-04-13,2021-04-14,2021-04-15,2021-04-16,2021-04-17,2021-04-18,2021-04-19,2021-04-20,2021-04-21
208,6035,Lassen County,CA,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,9,9,9,9,9,9,9,9,9,9,11,11,11,11,12,12,12,12,12,12,14,15,15,15,16,16,40,40,42,34,34,35,35,35,35,36,36,36,36,46,48,48,49,50,53,55,56,59,61,63,65,65,65,65,65,65,65,638,65,65,65,598,598,598,598,598,598,598,703,703,703,704,704,704,704,704,715,716,715,715,715,715,715,715,715,720,720,720,720,720,727,727,727,727,727,727,729,729,731,731,730,730,730,734,735,736,736,736,740,741,741,741,742,742,742,742,742,748,748,748,748,748,748,749,749,749,751,752,751,751,752,752,761,762,766,766,766,765,765,765,765,765,765,765,771,777,777,780,790,797,797,797,797,906,906,1110,1135,1135,1135,1135,1196,1196,1355,1355,1355,1715,1715,1773,1773,1773,1773,1773,1773,2182,2220,2352,2422,2422,2422,2422,2422,2930,2972,3127,3127,3127,3127,3584,3685,3732,3861,3669,3669,3669,3798,3848,3865,3894,3926,3981,4230,4284,4288,4296,4340,4377,4457,4531,4550,4554,4560,4581,4598,4613,4642,4647,4659,4663,4673,4693,4719,4740,4749,4753,4758,4763,4765,4816,4828,4835,4844,4851,4865,4874,4890,4900,4907,4908,4910,4927,4936,4964,4973,4976,4983,4993,5004,5015,5028,5037,5047,5047,5052,5052,5057,5067,5077,5088,5092,5095,5097,5102,5106,5113,5115,5120,5120,5120,5120,5120,5121,5121,5121,5121,5122,5122,5122,5124,5124,5125,5126,5127,5127,5127,5127,5128,5129,5131,5131,5131,5131,5131,5131,5131,5132,5132,5134,5134,5138,5138,5138,5138,5137,5140,5140,5147,5147,5147,5151,5152,5155,5154,5156,5156,5155
230,6079,San Luis Obispo County,CA,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,6,7,13,16,21,27,33,42,46,54,59,67,71,77,80,83,89,93,93,95,95,99,102,104,107,110,114,117,120,124,125,127,131,132,132,134,142,149,163,165,166,169,173,181,184,188,196,201,202,204,208,211,212,214,220,226,227,232,237,240,243,243,246,247,249,251,253,258,258,259,263,266,268,269,269,269,271,278,279,288,291,291,291,299,302,306,317,324,324,324,347,356,376,389,404,404,404,442,453,473,489,508,508,508,567,611,642,675,701,701,701,765,808,845,877,905,905,905,1006,1078,1112,1158,1213,1213,1213,1306,1369,1393,1467,1500,1500,1500,1644,1689,1710,1740,1783,1783,1783,1902,1926,1970,2047,2093,2093,2093,2254,2278,2300,2324,2439,2439,2439,2562,2571,2579,2613,2665,2665,2665,2735,2769,2792,2842,2882,2882,2882,2981,3006,3035,3047,3074,3074,3074,3074,3145,3171,3194,3222,3222,3222,3278,3293,3293,3332,3360,3360,3360,3438,3453,3479,3510,3544,3544,3544,3597,3612,3612,3649,3685,3685,3685,3742,3755,3779,3815,3842,3842,3842,3842,3924,3933,3969,3985,3985,3985,4049,4080,4092,4121,4141,4141,4141,4174,4191,4240,4265,4298,4298,4298,4342,4380,4422,4496,4568,4568,4568,4794,4846,4846,4972,5038,5038,5038,5250,5321,5400,5486,5607,5607,5607,5811,5885,5956,5956,6129,6129,6129,6311,6345,6378,6459,6540,6540,6540,6873,6965,7071,7267,7452,7452,7452,7711,7895,8090,8269,8460,8460,8460,8881,9006,9174,9267,9504,9749,9945,10154,10260,10387,10490,10846,11160,11467,11635,11945,12027,12305,12810,13286,13647,13968,14213,14367,14625,14851,15129,15466,15658,15742,15855,15936,16153,16336,16652,16758,16868,17016,17119,17279,17440,17522,17615,17690,17811,17922,18058,18152,18284,18349,18421,18489,18549,18615,18687,18803,18868,18891,18925,18964,18987,19061,19138,19178,19198,19233,19309,19335,19359,19411,19566,19588,19619,19645,19682,19704,19722,19760,19801,19818,19842,19866,19879,19897,19923,19951,19981,20001,20027,20042,20053,20079,20100,20137,20155,20171,20172,20172,20172,20172,20172,20215,20235,20271,20271,20328,20333,20333,20373,20402,20427,20427,20501,20515,20536,20561,20591,20623,20642,20656,20669,20691
214,6047,Merced County,CA,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,2,4,7,7,9,9,10,10,16,16,16,29,29,34,40,40,59,64,65,69,74,78,82,82,87,90,90,92,94,100,102,102,105,110,116,118,125,134,140,142,146,146,152,155,163,163,163,173,177,180,189,200,200,200,220,222,222,234,251,251,251,251,273,278,280,283,283,283,301,308,310,320,334,334,334,367,372,387,391,409,409,409,409,470,518,538,564,564,564,661,710,741,791,824,824,824,824,1064,1131,1208,1272,1272,1272,1530,1623,1702,1793,1884,1884,1884,2082,2183,2260,2313,2403,2403,2403,2694,2813,2994,3123,3245,3245,3245,3510,3763,3906,4065,4285,4285,4285,4422,4583,4760,4855,5012,5012,5012,5736,6102,6224,6561,6777,6777,6777,7041,7171,7229,7334,7428,7428,7428,7653,7690,7744,7814,7890,7890,7890,8032,8081,8122,8202,8277,8277,8277,8277,8415,8440,8487,8541,8541,8541,8616,8645,8645,8719,8750,8750,8750,8820,8803,8828,8846,8872,8872,8872,8939,8961,8961,9019,9033,9033,9033,9081,9090,9111,9130,9149,9149,9149,9149,9265,9286,9302,9314,9314,9314,9375,9386,9415,9439,9465,9465,9465,9531,9572,9588,9623,9674,9674,9674,9779,9819,9880,9916,9982,9982,9982,10140,10193,10193,10313,10374,10374,10374,10552,10662,10749,10884,10986,10986,10986,11388,11460,11537,11537,11537,11537,11537,12175,12281,12427,12622,12891,12891,12891,13637,13851,14058,14310,14483,14483,14483,14483,14483,14483,14483,16026,16026,16026,16435,16710,17012,17250,17485,17726,17890,18075,18245,18464,18526,18808,19118,19395,19600,19804,20051,20258,20510,20862,21357,21691,21884,22172,22497,22780,23060,23478,23800,23927,24108,24220,24426,24616,24787,25047,25181,25359,25439,25620,25865,26078,26222,26282,26427,26520,26629,26760,26881,26977,27082,27166,27246,27325,27453,27541,27601,27656,27724,27824,27870,27985,28066,28106,28166,28209,28295,28383,28455,28501,28767,28810,28863,28916,28966,28997,29039,29111,29170,29229,29269,29312,29342,29368,29435,29460,29492,29509,29526,29535,29547,29579,29610,29627,29660,29677,29679,29679,29679,29679,29679,29780,29820,29998,29998,30048,30068,30068,30132,30163,30179,30179,30249,30261,30276,30296,30325,30355,30392,30405,30422,30439
232,6083,Santa Barbara County,CA,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,3,8,9,13,18,18,24,26,42,47,64,68,88,99,111,111,152,168,174,192,218,228,243,260,264,267,284,313,334,354,373,373,394,416,429,440,451,460,469,471,473,473,485,496,506,508,514,526,544,613,722,1032,1250,1308,1362,1371,1376,1387,1402,1418,1474,1496,1517,1520,1540,1551,1551,1551,1551,1573,1624,1636,1636,1649,1649,1669,1697,1714,1737,1787,1787,1787,1817,1847,1864,1910,1971,1971,1971,2069,2115,2171,2253,2319,2319,2319,2446,2509,2590,2631,2712,2712,2712,2712,2896,2896,3261,3261,3261,3261,3655,3742,3808,3868,3931,3931,3931,4140,4323,4412,4635,4635,4759,4759,4991,5124,5282,5444,5576,5576,5576,5836,5931,6021,6094,6167,6167,6167,6464,6526,6586,6652,6704,6704,6704,6704,7074,7083,7148,7274,7274,7274,7454,7481,7508,7578,7653,7653,7653,7800,7869,7916,7951,7951,8014,8014,8143,8164,8229,8300,8361,8361,8361,8361,8499,8550,8579,8608,8608,8608,8718,8741,8741,8803,8846,8846,8846,8930,8952,8973,9003,9037,9037,9037,9133,9164,9164,9223,9249,9249,9249,9319,9340,9363,9387,9408,9408,9408,9745,9745,9520,9540,9560,9560,9560,9641,9671,9688,9715,9760,9760,9760,9820,9863,9886,9919,9944,9944,9944,9992,10016,10042,10089,10128,10128,10128,10230,10260,10260,10339,10415,10415,10415,10577,10612,10702,10768,10839,10839,10839,10934,11166,11205,11205,11205,11205,11205,11602,11631,11720,11845,11917,11917,11917,12379,12502,12636,12808,12866,13056,13198,13557,13763,13957,14190,14376,14391,14738,14817,14988,15189,15399,15644,15936,16159,16364,16579,16797,17025,17434,17759,18135,18265,18735,19063,19453,19931,20461,20971,21290,21740,22182,22714,23126,23487,24088,24401,24638,24991,25312,25654,25989,26234,26735,27065,27321,27435,27674,28105,28342,28570,28837,28951,29121,29344,29455,29645,29815,29978,30118,30216,30375,30454,30594,30744,30810,30865,30927,31054,31204,31289,31374,31421,31483,31563,31631,31724,31805,32192,32234,32282,32318,32355,32390,32420,32464,32496,32535,32563,32592,32614,32641,32676,32696,32724,32744,32783,32804,32821,32852,32885,32912,32929,32948,32953,32953,32953,32953,32953,33052,33096,33145,33145,33213,33235,33235,33289,33309,33424,33424,33591,33606,33613,33650,33672,33703,33738,33769,33786,33803


In [3]:
# (*) Identify 2 counties (treatment and control) preferably in same state 
#     where Covid cases reached a threshold at 2 DIFFERENT points of time 
#     (Assumption: 10K cases, when ppl startes losing jobs and not when
#     1st case was reported...companies realized this virus is here to stay)

# (*) Los Angeles county hit 10,000 cases on 04/14/2020
#     San Francisco county hit 10,000 cases on 09/08/2020
#     If we define Treatment as a county hitting 10,000 cases, it means from 04/14/2020 to 09/08/2020, 
#     LA was in treatment period, but SF was not. 
#     Therefore, we pick LA as treatment group, SF as control group for DiD. 
#     To simplfy analyses, we pick a 1 month period from 04/14/2020 to 05/14/20 as post-treatment period, AND
#     04/14/2019 to 05/14/2019 as pre-treatment period. This allows us to make a YOY comparision at same points of
#     time 

# DEFINITIONS:
#  Outcome variable: Median Home Price of Single Family Residence (SFR) in US dollars. 


# Parallel Trends assumption(Untestable in practice) for DiD: We assume that if Covid had not hit, the time trend in Los Angeles would 
# match the time trend of San Francisco. This seems reasonable to us considering both are top counties in SAME state 
# and have many similarities. There is nothing unique to LA or SF county that changed over time. Anything that 
# confounder changes over time affects treatment (LA) and control (SF) group in the same way. i.e. there are no time-varying 
# LA or SF specific confounders i.e. anything that changes over time that might affect median home prices cannot systematically
# differ between LA and SF (treatment and control)

# In other words, we There are no systematic differences in the changes of unobserved confounders.

# Calculate simple Difference-in-Difference of Median Home Price of Single Family Homes as the causal effect of
# Covid on Home Prices in Los Angeles county

<font size="3"> Data wrangling using DASK

In [4]:
# this function reads a batch of JSON files and returns a DASK bag containing all transactions. Each item in the 
# Dask bag is a dictionary

def work_read(filepath):
    b = bag.read_text(filepath).map(json.loads) 
    b = b.flatten()   # without flattening the bag, the results in the bag will be useless due to excessive nesting
    return b

In [5]:
# preprocessing function make sure we don't run into keyerrors to prevent "keyError/ Key not found"

def filters_preprocessing(bag_item):   # each bag_item will be a dictionary
    date = datetime.strptime(bag_item['sale']['saleTransDate'], '%Y-%m-%d')
    condition1 = 'proptype' in bag_item['summary'].keys()
    condition2 = ('amount' in bag_item['sale'].keys()) and ('saleamt' in bag_item['sale']['amount'].keys()) 
    condition3 = bag_item['summary']['proptype'] == 'SFR'
    return (condition1 and condition2) and condition3 

def filter_dates_pre(bag_item):        # filter transactions within pre-treatment period. Discard others.
    date = datetime.strptime(bag_item['sale']['saleTransDate'], '%Y-%m-%d')
    condition = (date >= datetime(2019,4,14)) & (date < datetime(2019,5,15))
    return condition

def filter_dates_post(bag_item):       # filter transactions within post-treatment period. Discard others.
    date = datetime.strptime(bag_item['sale']['saleTransDate'], '%Y-%m-%d')
    condition = (date >= datetime(2020,4,14)) & (date < datetime(2020,5,15))
    return condition

def extractor(record): # This Mapper takes as input a dictionary and returns the column of interest(sales price) 
    return record['sale']['amount']['saleamt'] 

<font size="4"> Read batch files obtained by Attom API calls for both counties into Dask Bags 

In [6]:
path_losangeles = '/Users/adityahpatel/Desktop/PYTHON PROGRAMS/Housing Full Dataset/CO06037 *' 
path_sanfrancisco = '/Users/adityahpatel/Desktop/PYTHON PROGRAMS/Housing Full Dataset/CO06075 *'

bag_losangeles = work_read(path_losangeles)  # bag contains all transations in LA county from 1/1/2019 
bag_sanfrancisco = work_read(path_sanfrancisco)

<font size="4"> Data Processing using DASK 

In [7]:
# Preprocessing Required for Dask. Needed to make sure there are no keyerrors while parsing dictionaries in Bag

bag_losangeles = bag_losangeles.filter(lambda x:'saleTransDate' in x['sale'].keys()).filter(lambda x:'proptype' in x['summary'].keys())
bag_sanfrancisco = bag_sanfrancisco.filter(lambda x:'saleTransDate' in x['sale'].keys()).filter(lambda x:'proptype' in x['summary'].keys())

In [8]:
# Dask Bag filtering. Retains only those transactions in pre-treatment period for SingleFamilyResidences 
bag_losangeles_pre = bag_losangeles.filter(filters_preprocessing).filter(filter_dates_pre)  
bag_sanfrancisco_pre = bag_sanfrancisco.filter(filters_preprocessing).filter(filter_dates_pre)  

# Dask Bag filtering. Retains only those transactions in pre-treatment period for SingleFamilyResidences 
bag_losangeles_post = bag_losangeles.filter(filters_preprocessing).filter(filter_dates_post) 
bag_sanfrancisco_post = bag_sanfrancisco.filter(filters_preprocessing).filter(filter_dates_post) 

# Put above 4 Dask bags into a python list
bags = [bag_losangeles_pre, bag_sanfrancisco_pre, bag_losangeles_post, bag_sanfrancisco_post]

# Apply Dask Mapper function 
def median_generator(bag):
    bag = bag.map(extractor)   # Dask mapper used to extract sales price column
    L = list(bag)
    median = statistics.median(L)
    return median

median_la_pre, median_sf_pre,median_la_post,median_sf_post = [median_generator(bag) for bag in bags]    

In [9]:
print('Median home price of SingleFamilyHome in Los Angeles in pre-treatment period = {}'.format(median_la_pre))
print('Median home price of SingleFamilyHome in SanFrancisco in pre-treatment period = {}'.format(median_sf_pre))

print('Median home price of SingleFamilyHome in Los Angeles in post-treatment period = {}'.format(median_la_post))
print('Median home price of SingleFamilyHome in SanFrancisco in post-treatment period = {}'.format(median_sf_post))

Median home price of SingleFamilyHome in Los Angeles in pre-treatment period = 667250.0
Median home price of SingleFamilyHome in SanFrancisco in pre-treatment period = 1632500.0
Median home price of SingleFamilyHome in Los Angeles in post-treatment period = 665000.0
Median home price of SingleFamilyHome in SanFrancisco in post-treatment period = 1500000


In [10]:
causaleffect_percentage = ((median_la_post-median_la_pre)/median_la_pre) - ((median_sf_post - median_sf_pre)/median_sf_pre)
print ("The causal effect of Coronavirus on SFR home prices in Los Angeles = {} %)".format(causaleffect_percentage * 100))

The causal effect of Coronavirus on SFR home prices in Los Angeles = 7.779180965506637 %)


<font size="4"> Finding: *Covid-19 actually caused single family home prices in Los Angeles County to increase!*

<font size="3"> We would be interested in extending this causal analysis to more complex versions of DiD (like "*DiD Regression with Fixed Effects*" method) "to include multiple periods (instead of single month) and multiple counties (instead of just 1 treatment county and 1 control county). 
    
<font size="3">The <u>"*DiD regression with Fixed Effects*" </u>method would allow us to report standard error/robustness of causal inference. We could also combine *DiD* method with <u>*Nearest Neighbor Matching*</u> method. This would involve 'matching' known 'treatment' units with simulated counterfactual 'control' units: characteristically equivalent units which did not receive treatment. 
    
<font size="3">*However, it is our intent to restrict the scope of causal inference to "basic" Exploratory Data Analysis as per Milestone # 1 guidelines.*

In [14]:
# We would be interested in extending this causal analysis to more complex versions of DiD (like "*DiD Regression 
# with Fixed Effects*" method) "to include multiple periods (instead of single month) and multiple counties 
# (instead of just 1 treatment county and 1 control county). 
    
# The "*DiD regression with Fixed Effects*" method would allow us to report standard error/robustness of causal 
# inference. We could also combine *DiD* method with <u>*Nearest Neighbor Matching*</u> method. This would involve
# 'matching' known 'treatment' units with simulated counterfactual 'control' units: characteristically equivalent 
# units which did not receive treatment. 
    
# However, it is our intent to restrict the scope of causal inference to "basic" Exploratory Data Analysis as 
# per Milestone # 1 guidelines.