# Team Spray and Pray

<img src="http://media1.giphy.com/media/215lvo8ytxpFC/giphy.gif">

<img src='https://github.com/ga-students/DC-DSI4/blob/master/curriculum/01-week/1.01-welcome/data-science-workflow-final.jpg?raw=true'>

# 1) Identify the Problem


West Nile virus is most commonly spread to humans through infected mosquitos. Around 20% of people who become infected with the virus develop symptoms ranging from a persistent fever, to serious neurological illnesses that can result in death.

In 2002, the first human cases of West Nile virus were reported in Chicago. By 2004 the City of Chicago and the Chicago Department of Public Health (CDPH) had established a comprehensive surveillance and control program that is still in effect today.

Every week from late spring through the fall, mosquitos in traps across the city are tested for the virus. The results of these tests influence when and where the city will spray airborne pesticides to control adult mosquito populations.

Given weather, location, testing, and spraying data, __this competition asks you to predict when and where different species of mosquitos will test positive for West Nile virus.__ A more accurate method of predicting outbreaks of West Nile virus in mosquitos will help the City of Chicago and CPHD more efficiently and effectively allocate resources towards preventing transmission of this potentially deadly virus.


Submissions are evaluated on area under the ROC curve between the predicted probability that West Nile Virus is present and the observed outcomes.

Submission File

For each record in the test set, you should __predict a real-valued probability that WNV is present.__ The file should contain a header and have the following format:

    Id,WnvPresent

    1, 0

    2, 1

    3, 0.9

    4, 0.2

    etc.
    
### Strategy

We intend to merge the train, weather and spray data files and review the data where West Nile virus is present.

### Questions

What information do we need in order to predict whether or not a mosquito is + for West Nile virus (WNV)?  _Ex, do we need the spray data, weather data or only the training data?_

What species of mosquitos tested + for WNV?

What was the most common time period for mosquitos with WNV?  _Late spring - fall, per briefing, but let's try to get more specific._

What location has the most mosquitos with WNV?

# 2) Acquire the Data

We are going to start by using three files, train, spray and weather.   We will use the test data to verify which variables we are able to use in model building.

We have downloaded the files from the kaggle website and saved them on a shared git repository that Adi created.  Adi made Troy and I contributors and we forked the repository on our local drives.  

We are going to use python to complete this challenge, more specifically, we will use pandas to read, clean and perform eda.  We will also use numpy and pandas profiling.

In [1]:
# Let's import the libraries we will need at this time
import pandas as pd
import numpy as np
import pandas_profiling as pdp

# 3) Parse the Data

##### Data Description

In this competition, you will be analyzing weather data and GIS data and predicting whether or not West Nile virus is present, for a given time, location, and species. 

Every year from late-May to early-October, public health workers in Chicago setup mosquito traps scattered across the city. Every week from Monday through Wednesday, these traps collect mosquitos, and the mosquitos are tested for the presence of West Nile virus before the end of the week. The test results include the number of mosquitos, the mosquitos species, and whether or not West Nile virus is present in the cohort. 

##### Main dataset

These test results are organized in such a way that when the number of mosquitos exceed 50, they are split into another record (another row in the dataset), such that the number of mosquitos are capped at 50. 

The location of the traps are described by the block number and street name. For your convenience, we have mapped these attributes into Longitude and Latitude in the dataset. Please note that these are derived locations. For example, Block=79, and Street= "W FOSTER AVE" gives us an approximate address of "7900 W FOSTER AVE, Chicago, IL", which translates to (41.974089,-87.824812) on the map.

Some traps are "satellite traps". These are traps that are set up near (usually within 6 blocks) an established trap to enhance surveillance efforts. Satellite traps are postfixed with letters. For example, T220A is a satellite trap to T220. 

Please note that not all the locations are tested at all times. Also, records exist only when a particular species of mosquitos is found at a certain trap at a certain time. In the test set, we ask you for all combinations/permutations of possible predictions and are only scoring the observed ones.

##### Spray Data

The City of Chicago also does spraying to kill mosquitos. You are given the GIS data for their spray efforts in 2011 and 2013. Spraying can reduce the number of mosquitos in the area, and therefore might eliminate the appearance of West Nile virus.

##### Weather Data

It is believed that hot and dry conditions are more favorable for West Nile virus than cold and wet. We provide you with the dataset from NOAA of the weather conditions of 2007 to 2014, during the months of the tests. 

Station 1: CHICAGO O'HARE INTERNATIONAL AIRPORT Lat: 41.995 Lon: -87.933 Elev: 662 ft. above sea level
Station 2: CHICAGO MIDWAY INTL ARPT Lat: 41.786 Lon: -87.752 Elev: 612 ft. above sea level

### File descriptions

##### train.csv, test.csv - the training and test set of the main dataset. The training set consists of data from 2007, 2009, 2011, and 2013, while in the test set you are requested to predict the test results for 2008, 2010, 2012, and 2014.

Id: the id of the record

Date: date that the WNV test is performed

Address: approximate address of the location of trap. This is used to send to the GeoCoder. 

Species: the species of mosquitos

Block: block number of address

Street: street name

Trap: Id of the trap

AddressNumberAndStreet: approximate address returned from GeoCoder

Latitude, Longitude: Latitude and Longitude returned from GeoCoder

AddressAccuracy: accuracy returned from GeoCoder

NumMosquitos: number of mosquitoes caught in this trap

WnvPresent: whether West Nile Virus was present in these mosquitos. 1 means WNV is present, and 0 means not present. 

##### spray.csv - GIS data of spraying efforts in 2011 and 2013

Date, Time: the date and time of the spray

Latitude, Longitude: the Latitude and Longitude of the spray

##### weather.csv - weather data from 2007 to 2014. 

Column descriptions in noaa_weather_qclcd_documentation.pdf.

##### sampleSubmission.csv - a sample submission file in the correct format

In [3]:
# Let's read in the files we are going to use for this competition

df_train = pd.read_csv('../assets/train.csv')
df_s = pd.read_csv('../assets/spray.csv')
df_w = pd.read_csv('../assets/weather.csv')
df_test = pd.read_csv('../assets/test.csv')
df_sample = pd.read_csv('../assets/sampleSubmission.csv')

In [11]:
# Notes for the Train data file
# Trap has 136 distinct values and Address has 138 distinct values
# 94.8% of mosquitos tested negative for WNV
# There are 813 duplicate rows
pdp.ProfileReport(df_train)

0,1
Number of variables,13
Number of observations,9693
Total Missing (%),0.0%
Total size in memory,984.5 KiB
Average record size in memory,104.0 B

0,1
Numeric,7
Categorical,6
Date,0
Text (Unique),0
Rejected,0

0,1
Distinct count,138
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0

0,1
"ORD Terminal 5, O'Hare International Airport, Chicago, IL 60666, USA",612
"South Doty Avenue, Chicago, IL, USA",212
"South Stony Island Avenue, Chicago, IL, USA",181
Other values (135),8688

Value,Count,Frequency (%),Unnamed: 3
"ORD Terminal 5, O'Hare International Airport, Chicago, IL 60666, USA",612,6.3%,
"South Doty Avenue, Chicago, IL, USA",212,2.2%,
"South Stony Island Avenue, Chicago, IL, USA",181,1.9%,
"4200 West 127th Street, Alsip, IL 60803, USA",172,1.8%,
"4100 North Oak Park Avenue, Chicago, IL 60634, USA",168,1.7%,
"2200 North Cannon Drive, Chicago, IL 60614, USA",157,1.6%,
"7000 West Armitage Avenue, Chicago, IL 60707, USA",154,1.6%,
"University of Illinois at Chicago, 1100 South Ashland Avenue, Chicago, IL 60607, USA",149,1.5%,
"5000 South Central Avenue, Chicago, IL 60638, USA",146,1.5%,
"1100 Roosevelt Road, Chicago, IL 60608, USA",146,1.5%,

0,1
Distinct count,4
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,7.9411
Minimum,3
Maximum,9
Zeros (%),0.0%

0,1
Minimum,3
5-th percentile,5
Q1,8
Median,8
Q3,9
95-th percentile,9
Maximum,9
Range,6
Interquartile range,1

0,1
Standard deviation,1.351
Coef of variation,0.17013
Kurtosis,1.8098
Mean,7.9411
MAD,0.88085
Skewness,-1.6478
Sum,76973
Variance,1.8252
Memory size,75.8 KiB

Value,Count,Frequency (%),Unnamed: 3
8,4522,46.7%,
9,3780,39.0%,
5,1302,13.4%,
3,89,0.9%,

Value,Count,Frequency (%),Unnamed: 3
3,89,0.9%,
5,1302,13.4%,
8,4522,46.7%,
9,3780,39.0%,

Value,Count,Frequency (%),Unnamed: 3
3,89,0.9%,
5,1302,13.4%,
8,4522,46.7%,
9,3780,39.0%,

0,1
Distinct count,138
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0

0,1
"1000 W OHARE AIRPORT, Chicago, IL",612
"1200 S DOTY AVE, Chicago, IL",212
"1000 S STONY ISLAND AVE, Chicago, IL",181
Other values (135),8688

Value,Count,Frequency (%),Unnamed: 3
"1000 W OHARE AIRPORT, Chicago, IL",612,6.3%,
"1200 S DOTY AVE, Chicago, IL",212,2.2%,
"1000 S STONY ISLAND AVE, Chicago, IL",181,1.9%,
"4200 W 127TH PL, Chicago, IL",172,1.8%,
"4100 N OAK PARK AVE, Chicago, IL",168,1.7%,
"2200 N CANNON DR, Chicago, IL",157,1.6%,
"7000 W ARMITAGE AVENUE, Chicago, IL",154,1.6%,
"1100 S ASHLAND AVE, Chicago, IL",149,1.5%,
"5000 S CENTRAL AVE, Chicago, IL",146,1.5%,
"1100 W ROOSEVELT, Chicago, IL",146,1.5%,

0,1
Distinct count,64
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,37.232
Minimum,10
Maximum,98
Zeros (%),0.0%

0,1
Minimum,10
5-th percentile,10
Q1,13
Median,36
Q3,58
95-th percentile,82
Maximum,98
Range,88
Interquartile range,45

0,1
Standard deviation,24.33
Coef of variation,0.65349
Kurtosis,-0.81152
Mean,37.232
MAD,20.676
Skewness,0.54304
Sum,360887
Variance,591.97
Memory size,75.8 KiB

Value,Count,Frequency (%),Unnamed: 3
10,1428,14.7%,
11,707,7.3%,
22,485,5.0%,
13,343,3.5%,
37,316,3.3%,
17,304,3.1%,
42,289,3.0%,
70,289,3.0%,
12,275,2.8%,
52,271,2.8%,

Value,Count,Frequency (%),Unnamed: 3
10,1428,14.7%,
11,707,7.3%,
12,275,2.8%,
13,343,3.5%,
14,97,1.0%,

Value,Count,Frequency (%),Unnamed: 3
90,77,0.8%,
91,108,1.1%,
93,21,0.2%,
96,30,0.3%,
98,23,0.2%,

0,1
Distinct count,95
Unique (%),1.0%
Missing (%),0.0%
Missing (n),0

0,1
2007-08-01,456
2007-08-15,266
2007-08-24,185
Other values (92),8786

Value,Count,Frequency (%),Unnamed: 3
2007-08-01,456,4.7%,
2007-08-15,266,2.7%,
2007-08-24,185,1.9%,
2007-08-21,185,1.9%,
2007-10-04,185,1.9%,
2007-08-07,184,1.9%,
2013-08-08,181,1.9%,
2013-08-01,172,1.8%,
2013-07-19,170,1.8%,
2011-07-25,168,1.7%,

0,1
Distinct count,138
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,41.848
Minimum,41.645
Maximum,42.017
Zeros (%),0.0%

0,1
Minimum,41.645
5-th percentile,41.673
Q1,41.75
Median,41.867
Q3,41.955
95-th percentile,41.995
Maximum,42.017
Range,0.37282
Interquartile range,0.20419

0,1
Standard deviation,0.10942
Coef of variation,0.0026146
Kurtosis,-1.3726
Mean,41.848
MAD,0.098515
Skewness,-0.14924
Sum,405630
Variance,0.011972
Memory size,75.8 KiB

Value,Count,Frequency (%),Unnamed: 3
41.974689,612,6.3%,
41.673408,212,2.2%,
41.726465,181,1.9%,
41.662014,172,1.8%,
41.95469,168,1.7%,
41.921965,157,1.6%,
41.916265,154,1.6%,
41.868077,149,1.5%,
41.801498,146,1.5%,
41.867108,146,1.5%,

Value,Count,Frequency (%),Unnamed: 3
41.644612,17,0.2%,
41.659112,105,1.1%,
41.662014,172,1.8%,
41.673408,212,2.2%,
41.678618,127,1.3%,

Value,Count,Frequency (%),Unnamed: 3
42.008314,135,1.4%,
42.009876,49,0.5%,
42.010412,63,0.6%,
42.011601,65,0.7%,
42.01743,67,0.7%,

0,1
Distinct count,138
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,-87.703
Minimum,-87.931
Maximum,-87.532
Zeros (%),0.0%

0,1
Minimum,-87.931
5-th percentile,-87.891
Q1,-87.761
Median,-87.698
Q3,-87.643
95-th percentile,-87.547
Maximum,-87.532
Range,0.39936
Interquartile range,0.1179

0,1
Standard deviation,0.093464
Coef of variation,-0.0010657
Kurtosis,-0.3225
Mean,-87.703
MAD,0.074303
Skewness,-0.30742
Sum,-850100
Variance,0.0087355
Memory size,75.8 KiB

Value,Count,Frequency (%),Unnamed: 3
-87.890615,612,6.3%,
-87.599862,212,2.2%,
-87.585413,181,1.9%,
-87.724608,172,1.8%,
-87.800991,168,1.7%,
-87.632085,157,1.6%,
-87.800515,154,1.6%,
-87.666901,149,1.5%,
-87.654224,146,1.5%,
-87.763416,146,1.5%,

Value,Count,Frequency (%),Unnamed: 3
-87.930995,128,1.3%,
-87.890615,612,6.3%,
-87.862995,77,0.8%,
-87.832763,133,1.4%,
-87.824812,35,0.4%,

Value,Count,Frequency (%),Unnamed: 3
-87.538693,105,1.1%,
-87.536497,57,0.6%,
-87.535198,139,1.4%,
-87.531657,23,0.2%,
-87.531635,41,0.4%,

0,1
Distinct count,50
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,10.211
Minimum,1
Maximum,50
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,1
Q1,2
Median,4
Q3,13
95-th percentile,45
Maximum,50
Range,49
Interquartile range,11

0,1
Standard deviation,13.139
Coef of variation,1.2868
Kurtosis,2.1764
Mean,10.211
MAD,9.8014
Skewness,1.7774
Sum,98971
Variance,172.63
Memory size,75.8 KiB

Value,Count,Frequency (%),Unnamed: 3
1,2261,23.3%,
2,1284,13.2%,
3,882,9.1%,
4,590,6.1%,
5,485,5.0%,
6,398,4.1%,
7,326,3.4%,
50,312,3.2%,
8,242,2.5%,
9,234,2.4%,

Value,Count,Frequency (%),Unnamed: 3
1,2261,23.3%,
2,1284,13.2%,
3,882,9.1%,
4,590,6.1%,
5,485,5.0%,

Value,Count,Frequency (%),Unnamed: 3
46,42,0.4%,
47,37,0.4%,
48,35,0.4%,
49,35,0.4%,
50,312,3.2%,

0,1
Distinct count,7
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
CULEX PIPIENS/RESTUANS,4469
CULEX RESTUANS,2672
CULEX PIPIENS,2239
Other values (4),313

Value,Count,Frequency (%),Unnamed: 3
CULEX PIPIENS/RESTUANS,4469,46.1%,
CULEX RESTUANS,2672,27.6%,
CULEX PIPIENS,2239,23.1%,
CULEX TERRITANS,221,2.3%,
CULEX SALINARIUS,85,0.9%,
CULEX TARSALIS,6,0.1%,
CULEX ERRATICUS,1,0.0%,

0,1
Distinct count,128
Unique (%),1.3%
Missing (%),0.0%
Missing (n),0

0,1
W OHARE AIRPORT,612
S ASHLAND AVE,262
S STONY ISLAND AVE,214
Other values (125),8605

Value,Count,Frequency (%),Unnamed: 3
W OHARE AIRPORT,612,6.3%,
S ASHLAND AVE,262,2.7%,
S STONY ISLAND AVE,214,2.2%,
S DOTY AVE,212,2.2%,
N OAK PARK AVE,199,2.1%,
W 51ST ST,185,1.9%,
N PULASKI RD,173,1.8%,
W 127TH PL,172,1.8%,
N CANNON DR,166,1.7%,
W ARMITAGE AVENUE,154,1.6%,

0,1
Distinct count,136
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0

0,1
T900,612
T115,212
T138,181
Other values (133),8688

Value,Count,Frequency (%),Unnamed: 3
T900,612,6.3%,
T115,212,2.2%,
T138,181,1.9%,
T135,172,1.8%,
T002,168,1.7%,
T054,157,1.6%,
T151,154,1.6%,
T090,149,1.5%,
T048,146,1.5%,
T031,146,1.5%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.051893
Minimum,0
Maximum,1
Zeros (%),94.8%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,1
Maximum,1
Range,1
Interquartile range,0

0,1
Standard deviation,0.22182
Coef of variation,4.2746
Kurtosis,14.333
Mean,0.051893
MAD,0.0984
Skewness,4.0411
Sum,503
Variance,0.049205
Memory size,75.8 KiB

Value,Count,Frequency (%),Unnamed: 3
0,9190,94.8%,
1,503,5.2%,

Value,Count,Frequency (%),Unnamed: 3
0,9190,94.8%,
1,503,5.2%,

Value,Count,Frequency (%),Unnamed: 3
0,9190,94.8%,
1,503,5.2%,

0,1
Distinct count,9693
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,5422
Minimum,0
Maximum,10505
Zeros (%),0.0%

0,1
Minimum,0.0
5-th percentile,488.6
Q1,2918.0
Median,5523.0
Q3,8013.0
95-th percentile,10016.0
Maximum,10505.0
Range,10505.0
Interquartile range,5095.0

0,1
Standard deviation,3019.2
Coef of variation,0.55685
Kurtosis,-1.1545
Mean,5422
MAD,2598.7
Skewness,-0.089367
Sum,52555426
Variance,9115900
Memory size,75.8 KiB

Value,Count,Frequency (%),Unnamed: 3
2047,1,0.0%,
5464,1,0.0%,
3411,1,0.0%,
1362,1,0.0%,
7505,1,0.0%,
5456,1,0.0%,
9550,1,0.0%,
3403,1,0.0%,
1354,1,0.0%,
7497,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
0,1,0.0%,
1,1,0.0%,
2,1,0.0%,
3,1,0.0%,
4,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
10501,1,0.0%,
10502,1,0.0%,
10503,1,0.0%,
10504,1,0.0%,
10505,1,0.0%,

Unnamed: 0,Date,Address,Species,Block,Street,Trap,AddressNumberAndStreet,Latitude,Longitude,AddressAccuracy,NumMosquitos,WnvPresent
0,2007-05-29,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX PIPIENS/RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,1,0
1,2007-05-29,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,1,0
2,2007-05-29,"6200 North Mandell Avenue, Chicago, IL 60646, USA",CULEX RESTUANS,62,N MANDELL AVE,T007,"6200 N MANDELL AVE, Chicago, IL",41.994991,-87.769279,9,1,0
3,2007-05-29,"7900 West Foster Avenue, Chicago, IL 60656, USA",CULEX PIPIENS/RESTUANS,79,W FOSTER AVE,T015,"7900 W FOSTER AVE, Chicago, IL",41.974089,-87.824812,8,1,0
4,2007-05-29,"7900 West Foster Avenue, Chicago, IL 60656, USA",CULEX RESTUANS,79,W FOSTER AVE,T015,"7900 W FOSTER AVE, Chicago, IL",41.974089,-87.824812,8,4,0


In [10]:
# Removed the duplicates
df_train[df_train.duplicated()].sort_values(by = 'Date')
df_train.drop_duplicates(inplace = True)

In [5]:
# Notes for the Weather data file
# DewPoint is highly correlated with Tmin
# Water1 has constant M value
pdp.ProfileReport(df_w)

0,1
Number of variables,22
Number of observations,2944
Total Missing (%),0.0%
Total size in memory,506.1 KiB
Average record size in memory,176.0 B

0,1
Numeric,5
Categorical,15
Date,0
Text (Unique),0
Rejected,2

0,1
Distinct count,178
Unique (%),6.0%
Missing (%),0.0%
Missing (n),0

0,1
6.9,63
5.8,60
7.4,55
Other values (175),2766

Value,Count,Frequency (%),Unnamed: 3
6.9,63,2.1%,
5.8,60,2.0%,
7.4,55,1.9%,
8.1,49,1.7%,
7.0,47,1.6%,
7.7,44,1.5%,
9.2,44,1.5%,
8.0,43,1.5%,
6.0,42,1.4%,
7.3,42,1.4%,

0,1
Distinct count,98
Unique (%),3.3%
Missing (%),0.0%
Missing (n),0

0,1
,1609
RA,296
RA BR,238
Other values (95),801

Value,Count,Frequency (%),Unnamed: 3
,1609,54.7%,
RA,296,10.1%,
RA BR,238,8.1%,
BR,110,3.7%,
TSRA RA BR,92,3.1%,
BR HZ,81,2.8%,
RA DZ BR,65,2.2%,
TSRA RA,43,1.5%,
HZ,39,1.3%,
RA BR HZ,38,1.3%,

0,1
Distinct count,31
Unique (%),1.1%
Missing (%),0.0%
Missing (n),0

0,1
0,1147
8,138
12,117
Other values (28),1542

Value,Count,Frequency (%),Unnamed: 3
0,1147,39.0%,
8,138,4.7%,
12,117,4.0%,
5,117,4.0%,
10,110,3.7%,
6,109,3.7%,
9,107,3.6%,
7,104,3.5%,
4,103,3.5%,
13,102,3.5%,

0,1
Distinct count,1472
Unique (%),50.0%
Missing (%),0.0%
Missing (n),0

0,1
2013-06-02,2
2009-09-14,2
2014-05-31,2
Other values (1469),2938

Value,Count,Frequency (%),Unnamed: 3
2013-06-02,2,0.1%,
2009-09-14,2,0.1%,
2014-05-31,2,0.1%,
2014-09-26,2,0.1%,
2009-09-13,2,0.1%,
2009-09-12,2,0.1%,
2009-09-11,2,0.1%,
2009-09-10,2,0.1%,
2009-09-17,2,0.1%,
2009-09-16,2,0.1%,

0,1
Distinct count,42
Unique (%),1.4%
Missing (%),0.0%
Missing (n),0

0,1
M,1472
2,93
-1,84
Other values (39),1295

Value,Count,Frequency (%),Unnamed: 3
M,1472,50.0%,
2,93,3.2%,
-1,84,2.9%,
-2,80,2.7%,
5,77,2.6%,
7,76,2.6%,
1,76,2.6%,
3,75,2.5%,
0,74,2.5%,
-3,72,2.4%,

0,1
Distinct count,2
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
M,1472
0,1472

Value,Count,Frequency (%),Unnamed: 3
M,1472,50.0%,
0,1472,50.0%,

0,1
Correlation,0.90436

0,1
Distinct count,31
Unique (%),1.1%
Missing (%),0.0%
Missing (n),0

0,1
0,1870
4,88
1,86
Other values (28),900

Value,Count,Frequency (%),Unnamed: 3
0,1870,63.5%,
4,88,3.0%,
1,86,2.9%,
2,81,2.8%,
8,67,2.3%,
3,66,2.2%,
5,61,2.1%,
15,57,1.9%,
7,49,1.7%,
12,49,1.7%,

0,1
Distinct count,168
Unique (%),5.7%
Missing (%),0.0%
Missing (n),0

0,1
0.00,1577
T,318
0.01,127
Other values (165),922

Value,Count,Frequency (%),Unnamed: 3
0.00,1577,53.6%,
T,318,10.8%,
0.01,127,4.3%,
0.02,63,2.1%,
0.03,46,1.6%,
0.04,36,1.2%,
0.05,32,1.1%,
0.08,28,1.0%,
0.12,28,1.0%,
0.06,27,0.9%,

0,1
Distinct count,36
Unique (%),1.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,17.495
Minimum,1
Maximum,36
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,2
Q1,7
Median,19
Q3,25
95-th percentile,34
Maximum,36
Range,35
Interquartile range,18

0,1
Standard deviation,10.064
Coef of variation,0.57523
Kurtosis,-1.1616
Mean,17.495
MAD,8.7028
Skewness,-0.062597
Sum,51505
Variance,101.28
Memory size,23.1 KiB

Value,Count,Frequency (%),Unnamed: 3
21,156,5.3%,
3,139,4.7%,
23,138,4.7%,
19,138,4.7%,
24,122,4.1%,
4,121,4.1%,
20,118,4.0%,
22,116,3.9%,
5,113,3.8%,
6,111,3.8%,

Value,Count,Frequency (%),Unnamed: 3
1,62,2.1%,
2,110,3.7%,
3,139,4.7%,
4,121,4.1%,
5,113,3.8%,

Value,Count,Frequency (%),Unnamed: 3
32,47,1.6%,
33,34,1.2%,
34,49,1.7%,
35,37,1.3%,
36,72,2.4%,

0,1
Distinct count,190
Unique (%),6.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.9607
Minimum,0.1
Maximum,24.1
Zeros (%),0.0%

0,1
Minimum,0.1
5-th percentile,1.9
Q1,4.3
Median,6.4
Q3,9.2
95-th percentile,13.5
Maximum,24.1
Range,24.0
Interquartile range,4.9

0,1
Standard deviation,3.5875
Coef of variation,0.5154
Kurtosis,0.7425
Mean,6.9607
MAD,2.8541
Skewness,0.73492
Sum,20492
Variance,12.87
Memory size,23.1 KiB

Value,Count,Frequency (%),Unnamed: 3
5.9,49,1.7%,
6.4,47,1.6%,
6.2,42,1.4%,
5.3,42,1.4%,
4.9,38,1.3%,
4.8,37,1.3%,
6.0,37,1.3%,
5.8,37,1.3%,
6.3,37,1.3%,
8.3,36,1.2%,

Value,Count,Frequency (%),Unnamed: 3
0.1,1,0.0%,
0.2,1,0.0%,
0.3,3,0.1%,
0.4,3,0.1%,
0.5,3,0.1%,

Value,Count,Frequency (%),Unnamed: 3
21.7,1,0.0%,
21.8,1,0.0%,
22.6,1,0.0%,
22.7,1,0.0%,
24.1,1,0.0%,

0,1
Distinct count,102
Unique (%),3.5%
Missing (%),0.0%
Missing (n),0

0,1
30.00,96
29.94,85
29.98,85
Other values (99),2678

Value,Count,Frequency (%),Unnamed: 3
30.00,96,3.3%,
29.94,85,2.9%,
29.98,85,2.9%,
29.92,83,2.8%,
29.89,82,2.8%,
30.05,81,2.8%,
29.95,80,2.7%,
29.91,80,2.7%,
30.02,80,2.7%,
29.93,79,2.7%,

0,1
Distinct count,4
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
M,1472
0.0,1459
T,12

Value,Count,Frequency (%),Unnamed: 3
M,1472,50.0%,
0.0,1459,49.6%,
T,12,0.4%,
0.1,1,0.0%,

0,1
Distinct count,2
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.5
Minimum,1
Maximum,2
Zeros (%),0.0%

0,1
Minimum,1.0
5-th percentile,1.0
Q1,1.0
Median,1.5
Q3,2.0
95-th percentile,2.0
Maximum,2.0
Range,1.0
Interquartile range,1.0

0,1
Standard deviation,0.50008
Coef of variation,0.33339
Kurtosis,-2.0014
Mean,1.5
MAD,0.5
Skewness,0
Sum,4416
Variance,0.25008
Memory size,23.1 KiB

Value,Count,Frequency (%),Unnamed: 3
1,1472,50.0%,
2,1472,50.0%,

Value,Count,Frequency (%),Unnamed: 3
1,1472,50.0%,
2,1472,50.0%,

Value,Count,Frequency (%),Unnamed: 3
1,1472,50.0%,
2,1472,50.0%,

0,1
Distinct count,104
Unique (%),3.5%
Missing (%),0.0%
Missing (n),0

0,1
29.34,128
29.28,124
29.26,123
Other values (101),2569

Value,Count,Frequency (%),Unnamed: 3
29.34,128,4.3%,
29.28,124,4.2%,
29.26,123,4.2%,
29.21,107,3.6%,
29.31,106,3.6%,
29.23,104,3.5%,
29.36,96,3.3%,
29.41,91,3.1%,
29.39,89,3.0%,
29.29,86,2.9%,

0,1
Distinct count,122
Unique (%),4.1%
Missing (%),0.0%
Missing (n),0

0,1
-,1472
0416,104
0417,64
Other values (119),1304

Value,Count,Frequency (%),Unnamed: 3
-,1472,50.0%,
0416,104,3.5%,
0417,64,2.2%,
0419,40,1.4%,
0418,32,1.1%,
0420,32,1.1%,
0422,32,1.1%,
0425,32,1.1%,
0421,24,0.8%,
0423,24,0.8%,

0,1
Distinct count,119
Unique (%),4.0%
Missing (%),0.0%
Missing (n),0

0,1
-,1472
1931,96
1930,56
Other values (116),1320

Value,Count,Frequency (%),Unnamed: 3
-,1472,50.0%,
1931,96,3.3%,
1930,56,1.9%,
1929,48,1.6%,
1923,32,1.1%,
1925,32,1.1%,
1927,32,1.1%,
1928,32,1.1%,
1926,24,0.8%,
1918,24,0.8%,

0,1
Distinct count,60
Unique (%),2.0%
Missing (%),0.0%
Missing (n),0

0,1
73,138
77,117
70,117
Other values (57),2572

Value,Count,Frequency (%),Unnamed: 3
73,138,4.7%,
77,117,4.0%,
70,117,4.0%,
75,110,3.7%,
71,109,3.7%,
74,107,3.6%,
72,104,3.5%,
69,103,3.5%,
78,102,3.5%,
76,100,3.4%,

0,1
Distinct count,63
Unique (%),2.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,76.166
Minimum,41
Maximum,104
Zeros (%),0.0%

0,1
Minimum,41
5-th percentile,54
Q1,69
Median,78
Q3,85
95-th percentile,92
Maximum,104
Range,63
Interquartile range,16

0,1
Standard deviation,11.462
Coef of variation,0.15049
Kurtosis,-0.26791
Mean,76.166
MAD,9.3499
Skewness,-0.55888
Sum,224233
Variance,131.38
Memory size,23.1 KiB

Value,Count,Frequency (%),Unnamed: 3
84,128,4.3%,
79,121,4.1%,
82,118,4.0%,
81,117,4.0%,
83,109,3.7%,
80,107,3.6%,
85,107,3.6%,
86,101,3.4%,
77,100,3.4%,
87,97,3.3%,

Value,Count,Frequency (%),Unnamed: 3
41,1,0.0%,
42,1,0.0%,
44,5,0.2%,
45,5,0.2%,
46,9,0.3%,

Value,Count,Frequency (%),Unnamed: 3
100,3,0.1%,
101,4,0.1%,
102,2,0.1%,
103,2,0.1%,
104,1,0.0%,

0,1
Distinct count,54
Unique (%),1.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,57.81
Minimum,29
Maximum,83
Zeros (%),0.0%

0,1
Minimum,29
5-th percentile,40
Q1,50
Median,59
Q3,66
95-th percentile,73
Maximum,83
Range,54
Interquartile range,16

0,1
Standard deviation,10.382
Coef of variation,0.17959
Kurtosis,-0.577
Mean,57.81
MAD,8.6016
Skewness,-0.3523
Sum,170194
Variance,107.78
Memory size,23.1 KiB

Value,Count,Frequency (%),Unnamed: 3
63,121,4.1%,
65,111,3.8%,
60,109,3.7%,
61,106,3.6%,
62,105,3.6%,
66,103,3.5%,
68,103,3.5%,
57,103,3.5%,
64,101,3.4%,
59,100,3.4%,

Value,Count,Frequency (%),Unnamed: 3
29,6,0.2%,
31,7,0.2%,
32,7,0.2%,
33,10,0.3%,
34,12,0.4%,

Value,Count,Frequency (%),Unnamed: 3
79,9,0.3%,
80,3,0.1%,
81,3,0.1%,
82,2,0.1%,
83,1,0.0%,

0,1
Constant value,M

0,1
Distinct count,48
Unique (%),1.6%
Missing (%),0.0%
Missing (n),0

0,1
63,135
65,131
59,129
Other values (45),2549

Value,Count,Frequency (%),Unnamed: 3
63,135,4.6%,
65,131,4.4%,
59,129,4.4%,
61,123,4.2%,
64,121,4.1%,
62,118,4.0%,
67,117,4.0%,
66,113,3.8%,
60,111,3.8%,
69,107,3.6%,

Unnamed: 0,Station,Date,Tmax,Tmin,Tavg,Depart,DewPoint,WetBulb,Heat,Cool,Sunrise,Sunset,CodeSum,Depth,Water1,SnowFall,PrecipTotal,StnPressure,SeaLevel,ResultSpeed,ResultDir,AvgSpeed
0,1,2007-05-01,83,50,67,14,51,56,0,2,0448,1849,,0,M,0.0,0.0,29.1,29.82,1.7,27,9.2
1,2,2007-05-01,84,52,68,M,51,57,0,3,-,-,,M,M,M,0.0,29.18,29.82,2.7,25,9.6
2,1,2007-05-02,59,42,51,-3,42,47,14,0,0447,1850,BR,0,M,0.0,0.0,29.38,30.09,13.0,4,13.4
3,2,2007-05-02,60,43,52,M,42,47,13,0,-,-,BR HZ,M,M,M,0.0,29.44,30.08,13.3,2,13.4
4,1,2007-05-03,66,46,56,2,40,48,9,0,0446,1851,,0,M,0.0,0.0,29.39,30.12,11.7,7,11.9


In [44]:
# Notes for the Spray data file
# There are 541 duplicate rows
# The Time variable is missing 584 values/3.9%

pdp.ProfileReport(df_s)

0,1
Number of variables,5
Number of observations,14294
Total Missing (%),0.8%
Total size in memory,558.4 KiB
Average record size in memory,40.0 B

0,1
Numeric,3
Categorical,2
Date,0
Text (Unique),0
Rejected,0

0,1
Distinct count,10
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
2013-08-15,2668
2013-08-29,2302
2013-07-17,2202
Other values (7),7122

Value,Count,Frequency (%),Unnamed: 3
2013-08-15,2668,18.7%,
2013-08-29,2302,16.1%,
2013-07-17,2202,15.4%,
2013-07-25,1607,11.2%,
2013-08-22,1587,11.1%,
2011-09-07,1573,11.0%,
2013-08-08,1195,8.4%,
2013-09-05,924,6.5%,
2013-08-16,141,1.0%,
2011-08-29,95,0.7%,

0,1
Distinct count,12887
Unique (%),90.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,41.902
Minimum,41.714
Maximum,42.396
Zeros (%),0.0%

0,1
Minimum,41.714
5-th percentile,41.725
Q1,41.783
Median,41.938
Q3,41.977
95-th percentile,42.007
Maximum,42.396
Range,0.68206
Interquartile range,0.19455

0,1
Standard deviation,0.1051
Coef of variation,0.0025083
Kurtosis,1.7606
Mean,41.902
MAD,0.084178
Skewness,-0.019855
Sum,598940
Variance,0.011046
Memory size,111.7 KiB

Value,Count,Frequency (%),Unnamed: 3
41.9953963549,11,0.1%,
41.9827717583,10,0.1%,
41.9944843118,9,0.1%,
41.9856518944,9,0.1%,
41.9840678196,8,0.1%,
41.9935242665,8,0.1%,
41.9969324275,8,0.1%,
41.9848838581,8,0.1%,
41.9950123368,7,0.0%,
41.9944363096,7,0.0%,

Value,Count,Frequency (%),Unnamed: 3
41.713925,1,0.0%,
41.714005,1,0.0%,
41.7140416667,1,0.0%,
41.7140983333,1,0.0%,
41.7141116667,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
42.395095,1,0.0%,
42.3952183333,1,0.0%,
42.3953516667,1,0.0%,
42.3956966667,1,0.0%,
42.3959833333,1,0.0%,

0,1
Distinct count,13007
Unique (%),91.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,-87.735
Minimum,-88.096
Maximum,-87.587
Zeros (%),0.0%

0,1
Minimum,-88.096
5-th percentile,-87.818
Q1,-87.79
Median,-87.725
Q3,-87.692
95-th percentile,-87.623
Maximum,-87.587
Range,0.50974
Interquartile range,0.0975

0,1
Standard deviation,0.067599
Coef of variation,-0.00077049
Kurtosis,3.7346
Mean,-87.735
MAD,0.053829
Skewness,-0.74207
Sum,-1254100
Variance,0.0045696
Memory size,111.7 KiB

Value,Count,Frequency (%),Unnamed: 3
-87.8068627874,9,0.1%,
-87.8069107897,9,0.1%,
-87.8167512547,7,0.0%,
-87.8056147285,6,0.0%,
-87.805710733,6,0.0%,
-87.806958792,5,0.0%,
-87.8104149553,5,0.0%,
-87.8137271118,5,0.0%,
-87.8115670097,5,0.0%,
-87.8131990869,5,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-88.0964683333,1,0.0%,
-88.0964466667,1,0.0%,
-88.096445,1,0.0%,
-88.0964433333,1,0.0%,
-88.09644,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-87.5867916667,1,0.0%,
-87.5867866667,1,0.0%,
-87.586775,1,0.0%,
-87.586755,1,0.0%,
-87.5867266667,1,0.0%,

0,1
Distinct count,8584
Unique (%),62.6%
Missing (%),4.1%
Missing (n),584

0,1
8:57:46 PM,5
8:55:46 PM,5
9:40:27 PM,5
Other values (8580),13695
(Missing),584

Value,Count,Frequency (%),Unnamed: 3
8:57:46 PM,5,0.0%,
8:55:46 PM,5,0.0%,
9:40:27 PM,5,0.0%,
9:37:27 PM,5,0.0%,
9:05:56 PM,5,0.0%,
8:58:56 PM,5,0.0%,
9:38:27 PM,5,0.0%,
8:57:56 PM,5,0.0%,
9:35:47 PM,5,0.0%,
9:31:27 PM,5,0.0%,

0,1
Distinct count,14294
Unique (%),100.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,7669
Minimum,0
Maximum,14834
Zeros (%),0.0%

0,1
Minimum,0.0
5-th percentile,1255.7
Q1,4114.2
Median,7687.5
Q3,11261.0
95-th percentile,14119.0
Maximum,14834.0
Range,14834.0
Interquartile range,7146.5

0,1
Standard deviation,4158.5
Coef of variation,0.54225
Kurtosis,-1.1637
Mean,7669
MAD,3592
Skewness,-0.026414
Sum,109620580
Variance,17293000
Memory size,111.7 KiB

Value,Count,Frequency (%),Unnamed: 3
2047,1,0.0%,
6758,1,0.0%,
2692,1,0.0%,
12931,1,0.0%,
8833,1,0.0%,
10880,1,0.0%,
4727,1,0.0%,
6774,1,0.0%,
2676,1,0.0%,
12915,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
0,1,0.0%,
1,1,0.0%,
2,1,0.0%,
3,1,0.0%,
4,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
14830,1,0.0%,
14831,1,0.0%,
14832,1,0.0%,
14833,1,0.0%,
14834,1,0.0%,

Unnamed: 0,Date,Time,Latitude,Longitude
0,2011-08-29,6:56:58 PM,42.391623,-88.089163
1,2011-08-29,6:57:08 PM,42.391348,-88.089163
2,2011-08-29,6:57:18 PM,42.391022,-88.089157
3,2011-08-29,6:57:28 PM,42.390637,-88.089158
4,2011-08-29,6:57:38 PM,42.39041,-88.088858


In [13]:
# Dropped the duplicate rows
df_s.drop_duplicates(inplace = True)

In [9]:
df_train[df_train.WnvPresent == 1].describe(include=['object'])

Unnamed: 0,Date,Address,Species,Street,Trap,AddressNumberAndStreet
count,551,551,551,551,551,551
unique,53,99,3,92,97,99
top,2013-08-29,"ORD Terminal 5, O'Hare International Airport, ...",CULEX PIPIENS/RESTUANS,W OHARE AIRPORT,T900,"1000 W OHARE AIRPORT, Chicago, IL"
freq,38,66,262,66,66,66


In [10]:
df_train[df_train.WnvPresent == 1].Species.unique()

array(['CULEX PIPIENS/RESTUANS', 'CULEX PIPIENS', 'CULEX RESTUANS'], dtype=object)

In [8]:
df_w.sort_values('Date')
df_w.sort_values('Station')
df_w.Station.unique()       # return the unique values
df_w.columns
df_w.dtypes
def eda(dataframe):
    print "missing values\n", dataframe.isnull().sum()  # count how many null values are in each series
    print('')                                           # creates a blank row
    print "dataframe types \n", dataframe.dtypes        # data types
    print('')
    print "dataframe shape \n", dataframe.shape         # row count, column count
    print('')
    print "dataframe describe \n", dataframe.describe() # count, mean, stand dev, min, max, Q1, Median, Q3
    print(' ')
    print 'duplicates \n', dataframe.duplicated().sum() # True if a row is identical to a previous row
#     print 'drop dups \n', dataframe.drop_duplicates()    # Drop the duplicate rows
    print(' ')
    print('number of unique values for each column')
    for item in dataframe:
        print item
        print(' ')
        print dataframe[item].nunique()                 # number of unique values for each column

print '************************WEATHER DATAFRAME EDA**********************'
print eda(df_w)
print '*************************TRAIN DATAFRAME EDA***********************'
print eda(df_train)
print '*************************SPRAY DATAFRAME EDA***********************'
print eda(df_s)

************************WEATHER DATAFRAME EDA**********************
missing values
Station        0
Date           0
Tmax           0
Tmin           0
Tavg           0
Depart         0
DewPoint       0
WetBulb        0
Heat           0
Cool           0
Sunrise        0
Sunset         0
CodeSum        0
Depth          0
Water1         0
SnowFall       0
PrecipTotal    0
StnPressure    0
SeaLevel       0
ResultSpeed    0
ResultDir      0
AvgSpeed       0
dtype: int64

dataframe types 
Station          int64
Date            object
Tmax             int64
Tmin             int64
Tavg            object
Depart          object
DewPoint         int64
WetBulb         object
Heat            object
Cool            object
Sunrise         object
Sunset          object
CodeSum         object
Depth           object
Water1          object
SnowFall        object
PrecipTotal     object
StnPressure     object
SeaLevel        object
ResultSpeed    float64
ResultDir        int64
AvgSpeed        object
dtype:

# 4) Mine the Data

We are using the _train_ data file as our sample and we will train our models on this dataset.

We will merge the data files we plan to use, spray, weather and train.  Then perform analysis based on the merged file.

In [26]:
# Merged the dataframes together based on date

cdf = pd.merge(df_train, df_w, left_on = 'Date',    
               right_on = 'Date', indicator = True) # "indicator = True" creates a column displaying how each row 
                                                    # merged, on the left, right or both.
cdf._merge.unique()                                 # Verified that every item merged succussfully

pdp.ProfileReport(cdf)

0,1
Number of variables,34
Number of observations,19386
Total Missing (%),0.0%
Total size in memory,5.0 MiB
Average record size in memory,273.0 B

0,1
Numeric,12
Categorical,20
Date,0
Text (Unique),0
Rejected,2

0,1
Distinct count,138
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0

0,1
"ORD Terminal 5, O'Hare International Airport, Chicago, IL 60666, USA",1224
"South Doty Avenue, Chicago, IL, USA",424
"South Stony Island Avenue, Chicago, IL, USA",362
Other values (135),17376

Value,Count,Frequency (%),Unnamed: 3
"ORD Terminal 5, O'Hare International Airport, Chicago, IL 60666, USA",1224,6.3%,
"South Doty Avenue, Chicago, IL, USA",424,2.2%,
"South Stony Island Avenue, Chicago, IL, USA",362,1.9%,
"4200 West 127th Street, Alsip, IL 60803, USA",344,1.8%,
"4100 North Oak Park Avenue, Chicago, IL 60634, USA",336,1.7%,
"2200 North Cannon Drive, Chicago, IL 60614, USA",314,1.6%,
"7000 West Armitage Avenue, Chicago, IL 60707, USA",308,1.6%,
"University of Illinois at Chicago, 1100 South Ashland Avenue, Chicago, IL 60607, USA",298,1.5%,
"5000 South Central Avenue, Chicago, IL 60638, USA",292,1.5%,
"1100 Roosevelt Road, Chicago, IL 60608, USA",292,1.5%,

0,1
Distinct count,4
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,7.9411
Minimum,3
Maximum,9
Zeros (%),0.0%

0,1
Minimum,3
5-th percentile,5
Q1,8
Median,8
Q3,9
95-th percentile,9
Maximum,9
Range,6
Interquartile range,1

0,1
Standard deviation,1.3509
Coef of variation,0.17012
Kurtosis,1.809
Mean,7.9411
MAD,0.88085
Skewness,-1.6477
Sum,153946
Variance,1.8251
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
8,9044,46.7%,
9,7560,39.0%,
5,2604,13.4%,
3,178,0.9%,

Value,Count,Frequency (%),Unnamed: 3
3,178,0.9%,
5,2604,13.4%,
8,9044,46.7%,
9,7560,39.0%,

Value,Count,Frequency (%),Unnamed: 3
3,178,0.9%,
5,2604,13.4%,
8,9044,46.7%,
9,7560,39.0%,

0,1
Distinct count,138
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0

0,1
"1000 W OHARE AIRPORT, Chicago, IL",1224
"1200 S DOTY AVE, Chicago, IL",424
"1000 S STONY ISLAND AVE, Chicago, IL",362
Other values (135),17376

Value,Count,Frequency (%),Unnamed: 3
"1000 W OHARE AIRPORT, Chicago, IL",1224,6.3%,
"1200 S DOTY AVE, Chicago, IL",424,2.2%,
"1000 S STONY ISLAND AVE, Chicago, IL",362,1.9%,
"4200 W 127TH PL, Chicago, IL",344,1.8%,
"4100 N OAK PARK AVE, Chicago, IL",336,1.7%,
"2200 N CANNON DR, Chicago, IL",314,1.6%,
"7000 W ARMITAGE AVENUE, Chicago, IL",308,1.6%,
"1100 S ASHLAND AVE, Chicago, IL",298,1.5%,
"1100 W ROOSEVELT, Chicago, IL",292,1.5%,
"5000 S CENTRAL AVE, Chicago, IL",292,1.5%,

0,1
Distinct count,87
Unique (%),0.4%
Missing (%),0.0%
Missing (n),0

0,1
6.0,1001
7.1,692
3.7,576
Other values (84),17117

Value,Count,Frequency (%),Unnamed: 3
6.0,1001,5.2%,
7.1,692,3.6%,
3.7,576,3.0%,
5.9,549,2.8%,
5.8,540,2.8%,
7.8,530,2.7%,
6.7,526,2.7%,
4.6,506,2.6%,
7.0,468,2.4%,
4.1,456,2.4%,

0,1
Distinct count,64
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,37.232
Minimum,10
Maximum,98
Zeros (%),0.0%

0,1
Minimum,10
5-th percentile,10
Q1,13
Median,36
Q3,58
95-th percentile,82
Maximum,98
Range,88
Interquartile range,45

0,1
Standard deviation,24.33
Coef of variation,0.65347
Kurtosis,-0.81162
Mean,37.232
MAD,20.676
Skewness,0.54299
Sum,721774
Variance,591.94
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
10,2856,14.7%,
11,1414,7.3%,
22,970,5.0%,
13,686,3.5%,
37,632,3.3%,
17,608,3.1%,
70,578,3.0%,
42,578,3.0%,
12,550,2.8%,
52,542,2.8%,

Value,Count,Frequency (%),Unnamed: 3
10,2856,14.7%,
11,1414,7.3%,
12,550,2.8%,
13,686,3.5%,
14,194,1.0%,

Value,Count,Frequency (%),Unnamed: 3
90,154,0.8%,
91,216,1.1%,
93,42,0.2%,
96,60,0.3%,
98,46,0.2%,

0,1
Distinct count,31
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
,10435
RA,1417
RA BR,1242
Other values (28),6292

Value,Count,Frequency (%),Unnamed: 3
,10435,53.8%,
RA,1417,7.3%,
RA BR,1242,6.4%,
HZ,976,5.0%,
BR,559,2.9%,
TSRA RA BR,558,2.9%,
TS TSRA RA BR,544,2.8%,
BR HZ,542,2.8%,
RA BR HZ,426,2.2%,
TSRA RA,405,2.1%,

0,1
Distinct count,22
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
0,3602
12,1554
7,1240
Other values (19),12990

Value,Count,Frequency (%),Unnamed: 3
0,3602,18.6%,
12,1554,8.0%,
7,1240,6.4%,
6,1180,6.1%,
15,1141,5.9%,
13,1123,5.8%,
10,1112,5.7%,
14,1109,5.7%,
8,1094,5.6%,
11,1051,5.4%,

0,1
Distinct count,95
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0

0,1
2007-08-01,912
2007-08-15,532
2007-10-04,370
Other values (92),17572

Value,Count,Frequency (%),Unnamed: 3
2007-08-01,912,4.7%,
2007-08-15,532,2.7%,
2007-10-04,370,1.9%,
2007-08-21,370,1.9%,
2007-08-24,370,1.9%,
2007-08-07,368,1.9%,
2013-08-08,362,1.9%,
2013-08-01,344,1.8%,
2013-07-19,340,1.8%,
2011-07-25,336,1.7%,

0,1
Distinct count,29
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
M,9693
7,869
8,823
Other values (26),8001

Value,Count,Frequency (%),Unnamed: 3
M,9693,50.0%,
7,869,4.5%,
8,823,4.2%,
5,815,4.2%,
4,652,3.4%,
-2,648,3.3%,
10,584,3.0%,
3,440,2.3%,
-3,409,2.1%,
-6,406,2.1%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
M,9693
0,9693

Value,Count,Frequency (%),Unnamed: 3
M,9693,50.0%,
0,9693,50.0%,

0,1
Distinct count,36
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,59.382
Minimum,38
Maximum,73
Zeros (%),0.0%

0,1
Minimum,38
5-th percentile,44
Q1,54
Median,60
Q3,67
95-th percentile,70
Maximum,73
Range,35
Interquartile range,13

0,1
Standard deviation,7.913
Coef of variation,0.13326
Kurtosis,-0.48036
Mean,59.382
MAD,6.5195
Skewness,-0.40424
Sum,1151187
Variance,62.616
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
67,1406,7.3%,
59,1203,6.2%,
54,1008,5.2%,
55,960,5.0%,
60,953,4.9%,
63,947,4.9%,
62,942,4.9%,
70,941,4.9%,
69,932,4.8%,
56,913,4.7%,

Value,Count,Frequency (%),Unnamed: 3
38,120,0.6%,
39,13,0.1%,
40,133,0.7%,
41,61,0.3%,
42,192,1.0%,

Value,Count,Frequency (%),Unnamed: 3
69,932,4.8%,
70,941,4.9%,
71,482,2.5%,
72,31,0.2%,
73,368,1.9%,

0,1
Distinct count,15
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
0,16323
1,702
4,351
Other values (12),2010

Value,Count,Frequency (%),Unnamed: 3
0,16323,84.2%,
1,702,3.6%,
4,351,1.8%,
2,288,1.5%,
9,283,1.5%,
11,225,1.2%,
10,218,1.1%,
13,198,1.0%,
5,192,1.0%,
8,191,1.0%,

0,1
Distinct count,138
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,41.848
Minimum,41.645
Maximum,42.017
Zeros (%),0.0%

0,1
Minimum,41.645
5-th percentile,41.673
Q1,41.75
Median,41.867
Q3,41.955
95-th percentile,41.995
Maximum,42.017
Range,0.37282
Interquartile range,0.20419

0,1
Standard deviation,0.10941
Coef of variation,0.0026146
Kurtosis,-1.3726
Mean,41.848
MAD,0.098515
Skewness,-0.14923
Sum,811260
Variance,0.011971
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
41.974689,1224,6.3%,
41.673408,424,2.2%,
41.726465,362,1.9%,
41.662014,344,1.8%,
41.95469,336,1.7%,
41.921965,314,1.6%,
41.916265,308,1.6%,
41.868077,298,1.5%,
41.867108,292,1.5%,
41.801498,292,1.5%,

Value,Count,Frequency (%),Unnamed: 3
41.644612,34,0.2%,
41.659112,210,1.1%,
41.662014,344,1.8%,
41.673408,424,2.2%,
41.678618,254,1.3%,

Value,Count,Frequency (%),Unnamed: 3
42.008314,270,1.4%,
42.009876,98,0.5%,
42.010412,126,0.6%,
42.011601,130,0.7%,
42.01743,134,0.7%,

0,1
Distinct count,138
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,-87.703
Minimum,-87.931
Maximum,-87.532
Zeros (%),0.0%

0,1
Minimum,-87.931
5-th percentile,-87.891
Q1,-87.761
Median,-87.698
Q3,-87.643
95-th percentile,-87.547
Maximum,-87.532
Range,0.39936
Interquartile range,0.1179

0,1
Standard deviation,0.093462
Coef of variation,-0.0010657
Kurtosis,-0.32273
Mean,-87.703
MAD,0.074303
Skewness,-0.3074
Sum,-1700200
Variance,0.0087351
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
-87.890615,1224,6.3%,
-87.599862,424,2.2%,
-87.585413,362,1.9%,
-87.724608,344,1.8%,
-87.800991,336,1.7%,
-87.632085,314,1.6%,
-87.800515,308,1.6%,
-87.666901,298,1.5%,
-87.763416,292,1.5%,
-87.654224,292,1.5%,

Value,Count,Frequency (%),Unnamed: 3
-87.930995,256,1.3%,
-87.890615,1224,6.3%,
-87.862995,154,0.8%,
-87.832763,266,1.4%,
-87.824812,70,0.4%,

Value,Count,Frequency (%),Unnamed: 3
-87.538693,210,1.1%,
-87.536497,114,0.6%,
-87.535198,278,1.4%,
-87.531657,46,0.2%,
-87.531635,82,0.4%,

0,1
Distinct count,50
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,10.211
Minimum,1
Maximum,50
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,1
Q1,2
Median,4
Q3,13
95-th percentile,45
Maximum,50
Range,49
Interquartile range,11

0,1
Standard deviation,13.138
Coef of variation,1.2867
Kurtosis,2.1755
Mean,10.211
MAD,9.8014
Skewness,1.7773
Sum,197942
Variance,172.62
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
1,4522,23.3%,
2,2568,13.2%,
3,1764,9.1%,
4,1180,6.1%,
5,970,5.0%,
6,796,4.1%,
7,652,3.4%,
50,624,3.2%,
8,484,2.5%,
9,468,2.4%,

Value,Count,Frequency (%),Unnamed: 3
1,4522,23.3%,
2,2568,13.2%,
3,1764,9.1%,
4,1180,6.1%,
5,970,5.0%,

Value,Count,Frequency (%),Unnamed: 3
46,84,0.4%,
47,74,0.4%,
48,70,0.4%,
49,70,0.4%,
50,624,3.2%,

0,1
Distinct count,47
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
0.00,10760
T,1838
0.23,564
Other values (44),6224

Value,Count,Frequency (%),Unnamed: 3
0.00,10760,55.5%,
T,1838,9.5%,
0.23,564,2.9%,
0.06,460,2.4%,
0.01,415,2.1%,
0.36,303,1.6%,
0.02,302,1.6%,
0.16,284,1.5%,
0.83,266,1.4%,
0.84,226,1.2%,

0,1
Distinct count,36
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,17.789
Minimum,1
Maximum,36
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,4
Q1,9
Median,19
Q3,24
95-th percentile,33
Maximum,36
Range,35
Interquartile range,15

0,1
Standard deviation,9.2585
Coef of variation,0.52045
Kurtosis,-1.0607
Mean,17.789
MAD,7.9425
Skewness,-0.011252
Sum,344865
Variance,85.719
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
23,1365,7.0%,
24,1164,6.0%,
19,1104,5.7%,
25,1069,5.5%,
5,986,5.1%,
13,934,4.8%,
30,831,4.3%,
17,805,4.2%,
9,777,4.0%,
7,776,4.0%,

Value,Count,Frequency (%),Unnamed: 3
1,126,0.6%,
2,182,0.9%,
3,626,3.2%,
4,500,2.6%,
5,986,5.1%,

Value,Count,Frequency (%),Unnamed: 3
32,208,1.1%,
33,109,0.6%,
34,260,1.3%,
35,306,1.6%,
36,340,1.8%,

0,1
Distinct count,89
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,5.9695
Minimum,0.1
Maximum,15.4
Zeros (%),0.0%

0,1
Minimum,0.1
5-th percentile,1.9
Q1,3.9
Median,5.5
Q3,7.8
95-th percentile,10.7
Maximum,15.4
Range,15.3
Interquartile range,3.9

0,1
Standard deviation,2.8889
Coef of variation,0.48394
Kurtosis,0.075825
Mean,5.9695
MAD,2.3054
Skewness,0.58996
Sum,115720
Variance,8.3456
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
3.5,892,4.6%,
6.2,712,3.7%,
2.1,624,3.2%,
5.8,565,2.9%,
3.4,558,2.9%,
4.1,518,2.7%,
6.4,517,2.7%,
9.1,475,2.5%,
5.5,472,2.4%,
8.3,411,2.1%,

Value,Count,Frequency (%),Unnamed: 3
0.1,109,0.6%,
1.1,109,0.6%,
1.2,189,1.0%,
1.4,120,0.6%,
1.5,302,1.6%,

Value,Count,Frequency (%),Unnamed: 3
12.9,93,0.5%,
13.3,170,0.9%,
13.4,186,1.0%,
14.6,61,0.3%,
15.4,61,0.3%,

0,1
Distinct count,56
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
29.98,1393
30.05,1204
29.95,910
Other values (53),15879

Value,Count,Frequency (%),Unnamed: 3
29.98,1393,7.2%,
30.05,1204,6.2%,
29.95,910,4.7%,
30.04,887,4.6%,
30.00,883,4.6%,
29.91,874,4.5%,
29.89,848,4.4%,
29.87,815,4.2%,
30.11,772,4.0%,
29.82,696,3.6%,

0,1
Distinct count,3
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
M,9693
0.0,9517
T,176

Value,Count,Frequency (%),Unnamed: 3
M,9693,50.0%,
0.0,9517,49.1%,
T,176,0.9%,

0,1
Distinct count,7
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
CULEX PIPIENS/RESTUANS,8938
CULEX RESTUANS,5344
CULEX PIPIENS,4478
Other values (4),626

Value,Count,Frequency (%),Unnamed: 3
CULEX PIPIENS/RESTUANS,8938,46.1%,
CULEX RESTUANS,5344,27.6%,
CULEX PIPIENS,4478,23.1%,
CULEX TERRITANS,442,2.3%,
CULEX SALINARIUS,170,0.9%,
CULEX TARSALIS,12,0.1%,
CULEX ERRATICUS,2,0.0%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.5
Minimum,1
Maximum,2
Zeros (%),0.0%

0,1
Minimum,1.0
5-th percentile,1.0
Q1,1.0
Median,1.5
Q3,2.0
95-th percentile,2.0
Maximum,2.0
Range,1.0
Interquartile range,1.0

0,1
Standard deviation,0.50001
Coef of variation,0.33334
Kurtosis,-2.0002
Mean,1.5
MAD,0.5
Skewness,0
Sum,29079
Variance,0.25001
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
2,9693,50.0%,
1,9693,50.0%,

Value,Count,Frequency (%),Unnamed: 3
1,9693,50.0%,
2,9693,50.0%,

Value,Count,Frequency (%),Unnamed: 3
1,9693,50.0%,
2,9693,50.0%,

0,1
Distinct count,53
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
29.34,1975
29.29,1025
29.23,1018
Other values (50),15368

Value,Count,Frequency (%),Unnamed: 3
29.34,1975,10.2%,
29.29,1025,5.3%,
29.23,1018,5.3%,
29.26,958,4.9%,
29.18,851,4.4%,
29.28,849,4.4%,
29.21,822,4.2%,
29.39,745,3.8%,
29.33,703,3.6%,
29.36,642,3.3%,

0,1
Distinct count,128
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0

0,1
W OHARE AIRPORT,1224
S ASHLAND AVE,524
S STONY ISLAND AVE,428
Other values (125),17210

Value,Count,Frequency (%),Unnamed: 3
W OHARE AIRPORT,1224,6.3%,
S ASHLAND AVE,524,2.7%,
S STONY ISLAND AVE,428,2.2%,
S DOTY AVE,424,2.2%,
N OAK PARK AVE,398,2.1%,
W 51ST ST,370,1.9%,
N PULASKI RD,346,1.8%,
W 127TH PL,344,1.8%,
N CANNON DR,332,1.7%,
W ARMITAGE AVENUE,308,1.6%,

0,1
Distinct count,63
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
-,9693
0445,628
0416,612
Other values (60),8453

Value,Count,Frequency (%),Unnamed: 3
-,9693,50.0%,
0445,628,3.2%,
0416,612,3.2%,
0459,422,2.2%,
0528,404,2.1%,
0438,344,1.8%,
0419,320,1.7%,
0417,319,1.6%,
0451,315,1.6%,
0426,292,1.5%,

0,1
Distinct count,59
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
-,9693
1931,630
1911,628
Other values (56),8435

Value,Count,Frequency (%),Unnamed: 3
-,9693,50.0%,
1931,630,3.2%,
1911,628,3.2%,
1928,606,3.1%,
1854,422,2.2%,
1809,404,2.1%,
1923,346,1.8%,
1918,344,1.8%,
1930,318,1.6%,
1904,315,1.6%,

0,1
Distinct count,36
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
77,1554
72,1240
71,1180
Other values (33),15412

Value,Count,Frequency (%),Unnamed: 3
77,1554,8.0%,
72,1240,6.4%,
71,1180,6.1%,
80,1141,5.9%,
78,1123,5.8%,
75,1112,5.7%,
79,1109,5.7%,
73,1094,5.6%,
76,1051,5.4%,
81,998,5.1%,

0,1
Distinct count,38
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,81.612
Minimum,57
Maximum,97
Zeros (%),0.0%

0,1
Minimum,57
5-th percentile,62
Q1,78
Median,83
Q3,87
95-th percentile,92
Maximum,97
Range,40
Interquartile range,9

0,1
Standard deviation,8.3866
Coef of variation,0.10276
Kurtosis,0.57201
Mean,81.612
MAD,6.3745
Skewness,-0.9446
Sum,1582128
Variance,70.335
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
86,1678,8.7%,
82,1490,7.7%,
83,1469,7.6%,
91,1266,6.5%,
81,1246,6.4%,
84,1075,5.5%,
90,901,4.6%,
85,876,4.5%,
87,783,4.0%,
92,757,3.9%,

Value,Count,Frequency (%),Unnamed: 3
57,61,0.3%,
58,181,0.9%,
59,250,1.3%,
60,92,0.5%,
61,118,0.6%,

Value,Count,Frequency (%),Unnamed: 3
92,757,3.9%,
93,62,0.3%,
94,110,0.6%,
96,170,0.9%,
97,170,0.9%,

0,1
Distinct count,36
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,63.304
Minimum,41
Maximum,79
Zeros (%),0.0%

0,1
Minimum,41
5-th percentile,48
Q1,58
Median,65
Q3,69
95-th percentile,73
Maximum,79
Range,38
Interquartile range,11

0,1
Standard deviation,7.688
Coef of variation,0.12145
Kurtosis,-0.26776
Mean,63.304
MAD,6.3574
Skewness,-0.71525
Sum,1227207
Variance,59.105
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
69,2700,13.9%,
70,1255,6.5%,
71,1241,6.4%,
60,1206,6.2%,
63,896,4.6%,
57,810,4.2%,
58,791,4.1%,
73,791,4.1%,
64,763,3.9%,
67,748,3.9%,

Value,Count,Frequency (%),Unnamed: 3
41,65,0.3%,
43,13,0.1%,
44,142,0.7%,
45,61,0.3%,
46,382,2.0%,

Value,Count,Frequency (%),Unnamed: 3
73,791,4.1%,
74,260,1.3%,
75,170,0.9%,
76,48,0.2%,
79,48,0.2%,

0,1
Distinct count,136
Unique (%),0.7%
Missing (%),0.0%
Missing (n),0

0,1
T900,1224
T115,424
T138,362
Other values (133),17376

Value,Count,Frequency (%),Unnamed: 3
T900,1224,6.3%,
T115,424,2.2%,
T138,362,1.9%,
T135,344,1.8%,
T002,336,1.7%,
T054,314,1.6%,
T151,308,1.6%,
T090,298,1.5%,
T031,292,1.5%,
T048,292,1.5%,

0,1
Constant value,M

0,1
Distinct count,32
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
69,1858
62,1448
71,1410
Other values (29),14670

Value,Count,Frequency (%),Unnamed: 3
69,1858,9.6%,
62,1448,7.5%,
71,1410,7.3%,
70,1320,6.8%,
65,1099,5.7%,
64,972,5.0%,
67,890,4.6%,
63,881,4.5%,
72,871,4.5%,
61,867,4.5%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.051893
Minimum,0
Maximum,1
Zeros (%),94.8%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,1
Maximum,1
Range,1
Interquartile range,0

0,1
Standard deviation,0.22182
Coef of variation,4.2745
Kurtosis,14.329
Mean,0.051893
MAD,0.0984
Skewness,4.0407
Sum,1006
Variance,0.049203
Memory size,302.9 KiB

Value,Count,Frequency (%),Unnamed: 3
0,18380,94.8%,
1,1006,5.2%,

Value,Count,Frequency (%),Unnamed: 3
0,18380,94.8%,
1,1006,5.2%,

Value,Count,Frequency (%),Unnamed: 3
0,18380,94.8%,
1,1006,5.2%,

0,1
Constant value,both

Unnamed: 0,Date,Address,Species,Block,Street,Trap,AddressNumberAndStreet,Latitude,Longitude,AddressAccuracy,NumMosquitos,WnvPresent,Station,Tmax,Tmin,Tavg,Depart,DewPoint,WetBulb,Heat,Cool,Sunrise,Sunset,CodeSum,Depth,Water1,SnowFall,PrecipTotal,StnPressure,SeaLevel,ResultSpeed,ResultDir,AvgSpeed,_merge
0,2007-05-29,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX PIPIENS/RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,1,0,1,88,60,74,10,58,65,0,9,0421,1917,BR HZ,0,M,0.0,0.0,29.39,30.11,5.8,18,6.5,both
1,2007-05-29,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX PIPIENS/RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,1,0,2,88,65,77,M,59,66,0,12,-,-,BR HZ,M,M,M,0.0,29.44,30.09,5.8,16,7.4,both
2,2007-05-29,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,1,0,1,88,60,74,10,58,65,0,9,0421,1917,BR HZ,0,M,0.0,0.0,29.39,30.11,5.8,18,6.5,both
3,2007-05-29,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,1,0,2,88,65,77,M,59,66,0,12,-,-,BR HZ,M,M,M,0.0,29.44,30.09,5.8,16,7.4,both
4,2007-05-29,"6200 North Mandell Avenue, Chicago, IL 60646, USA",CULEX RESTUANS,62,N MANDELL AVE,T007,"6200 N MANDELL AVE, Chicago, IL",41.994991,-87.769279,9,1,0,1,88,60,74,10,58,65,0,9,0421,1917,BR HZ,0,M,0.0,0.0,29.39,30.11,5.8,18,6.5,both


In [35]:
cdf.drop('_merge', axis = 1, inplace = True) # axis=0 for rows, 1 for columns
cdf.dtypes

[both]
Categories (1, object): [both]

In [37]:
cdf = pd.merge(cdf, df_s, left_on = 'Date',         # merged the spray dataframe with the cdf dataframe
               right_on = 'Date', indicator = True) # "indicator = True" creates a column displaying how each row 
                                                    # merged, on the left, right or both.
cdf._merge.unique()

[both]
Categories (1, object): [both]

In [40]:
cdf.dtypes
pdp.ProfileReport(cdf)

0,1
Number of variables,37
Number of observations,2911126
Total Missing (%),0.0%
Total size in memory,824.6 MiB
Average record size in memory,297.0 B

0,1
Numeric,14
Categorical,18
Date,0
Text (Unique),0
Rejected,5

0,1
Distinct count,73
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
"ORD Terminal 5, O'Hare International Airport, Chicago, IL 60666, USA",297610
"9100 West Higgins Road, Rosemont, IL 60018, USA",60758
"4100 North Oak Park Avenue, Chicago, IL 60634, USA",60150
Other values (70),2492608

Value,Count,Frequency (%),Unnamed: 3
"ORD Terminal 5, O'Hare International Airport, Chicago, IL 60666, USA",297610,10.2%,
"9100 West Higgins Road, Rosemont, IL 60018, USA",60758,2.1%,
"4100 North Oak Park Avenue, Chicago, IL 60634, USA",60150,2.1%,
"7000 North Moselle Avenue, Chicago, IL 60646, USA",59504,2.0%,
"2200 North Cannon Drive, Chicago, IL 60614, USA",54548,1.9%,
"1000 North Central Park Avenue, Chicago, IL 60651, USA",54404,1.9%,
"5800 North Western Avenue, Chicago, IL 60659, USA",52712,1.8%,
"University of Illinois at Chicago, 1100 South Ashland Avenue, Chicago, IL 60607, USA",51476,1.8%,
"8900 South Carpenter Street, Chicago, IL 60620, USA",49766,1.7%,
"South Brandon Avenue, Chicago, IL 60617, USA",48376,1.7%,

0,1
Distinct count,4
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,8.1349
Minimum,3
Maximum,9
Zeros (%),0.0%

0,1
Minimum,3
5-th percentile,5
Q1,8
Median,8
Q3,9
95-th percentile,9
Maximum,9
Range,6
Interquartile range,1

0,1
Standard deviation,1.218
Coef of variation,0.14973
Kurtosis,3.8182
Mean,8.1349
MAD,0.78887
Skewness,-2.0462
Sum,23681650
Variance,1.4835
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
9,1327262,45.6%,
8,1288360,44.3%,
5,271450,9.3%,
3,24054,0.8%,

Value,Count,Frequency (%),Unnamed: 3
3,24054,0.8%,
5,271450,9.3%,
8,1288360,44.3%,
9,1327262,45.6%,

Value,Count,Frequency (%),Unnamed: 3
3,24054,0.8%,
5,271450,9.3%,
8,1288360,44.3%,
9,1327262,45.6%,

0,1
Distinct count,73
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
"1000 W OHARE AIRPORT, Chicago, IL",297610
"9100 W HIGGINS RD, Chicago, IL",60758
"4100 N OAK PARK AVE, Chicago, IL",60150
Other values (70),2492608

Value,Count,Frequency (%),Unnamed: 3
"1000 W OHARE AIRPORT, Chicago, IL",297610,10.2%,
"9100 W HIGGINS RD, Chicago, IL",60758,2.1%,
"4100 N OAK PARK AVE, Chicago, IL",60150,2.1%,
"7000 N MOSELL AVE, Chicago, IL",59504,2.0%,
"2200 N CANNON DR, Chicago, IL",54548,1.9%,
"1000 N CENTRAL PARK DR, Chicago, IL",54404,1.9%,
"5800 N WESTERN AVE, Chicago, IL",52712,1.8%,
"1100 S ASHLAND AVE, Chicago, IL",51476,1.8%,
"8900 S CARPENTER ST, Chicago, IL",49766,1.7%,
"1300 S BRANDON, Chicago, IL",48376,1.7%,

0,1
Distinct count,9
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
7.0,530058
5.0,416208
4.9,416208
Other values (6),1548652

Value,Count,Frequency (%),Unnamed: 3
7.0,530058,18.2%,
5.0,416208,14.3%,
4.9,416208,14.3%,
4.6,315374,10.8%,
4.7,315374,10.8%,
5.1,242657,8.3%,
3.9,242657,8.3%,
9.3,216295,7.4%,
10.8,216295,7.4%,

0,1
Distinct count,43
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,38.494
Minimum,10
Maximum,91
Zeros (%),0.0%

0,1
Minimum,10
5-th percentile,10
Q1,13
Median,37
Q3,58
95-th percentile,89
Maximum,91
Range,81
Interquartile range,45

0,1
Standard deviation,24.792
Coef of variation,0.64405
Kurtosis,-0.97446
Mean,38.494
MAD,21.361
Skewness,0.44073
Sum,112061628
Variance,614.65
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
10,485658,16.7%,
11,151590,5.2%,
17,129772,4.5%,
22,128898,4.4%,
58,125154,4.3%,
70,97900,3.4%,
61,88860,3.1%,
39,84342,2.9%,
42,81076,2.8%,
13,78302,2.7%,

Value,Count,Frequency (%),Unnamed: 3
10,485658,16.7%,
11,151590,5.2%,
12,70536,2.4%,
13,78302,2.7%,
14,44516,1.5%,

Value,Count,Frequency (%),Unnamed: 3
79,24322,0.8%,
82,40878,1.4%,
89,70874,2.4%,
90,23538,0.8%,
91,60758,2.1%,

0,1
Distinct count,4
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
,1750320
RA,530058
BR HZ,315374

Value,Count,Frequency (%),Unnamed: 3
,1750320,60.1%,
RA,530058,18.2%,
BR HZ,315374,10.8%,
FG BR HZ,315374,10.8%,

0,1
Distinct count,8
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
13,630748
8,530058
1,416208
Other values (5),1334112

Value,Count,Frequency (%),Unnamed: 3
13,630748,21.7%,
8,530058,18.2%,
1,416208,14.3%,
0,416208,14.3%,
5,242657,8.3%,
4,242657,8.3%,
9,216295,7.4%,
7,216295,7.4%,

0,1
Distinct count,5
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
2013-08-15,832416
2013-08-29,630748
2013-08-22,530058
Other values (2),917904

Value,Count,Frequency (%),Unnamed: 3
2013-08-15,832416,28.6%,
2013-08-29,630748,21.7%,
2013-08-22,530058,18.2%,
2013-07-25,485314,16.7%,
2013-08-08,432590,14.9%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
M,1455563
-7,416208
8,315374
Other values (3),723981

Value,Count,Frequency (%),Unnamed: 3
M,1455563,50.0%,
-7,416208,14.3%,
8,315374,10.8%,
2,265029,9.1%,
-5,242657,8.3%,
-1,216295,7.4%,

0,1
Distinct count,10
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,57.892
Minimum,50
Maximum,67
Zeros (%),0.0%

0,1
Minimum,50
5-th percentile,50
Q1,53
Median,55
Q3,65
95-th percentile,67
Maximum,67
Range,17
Interquartile range,12

0,1
Standard deviation,6.3579
Coef of variation,0.10982
Kurtosis,-1.6478
Mean,57.892
MAD,5.9979
Skewness,0.27075
Sum,168531914
Variance,40.422
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
53,416208,14.3%,
50,416208,14.3%,
65,315374,10.8%,
64,315374,10.8%,
67,265029,9.1%,
66,265029,9.1%,
54,242657,8.3%,
52,242657,8.3%,
56,216295,7.4%,
55,216295,7.4%,

Value,Count,Frequency (%),Unnamed: 3
50,416208,14.3%,
52,242657,8.3%,
53,416208,14.3%,
54,242657,8.3%,
55,216295,7.4%,

Value,Count,Frequency (%),Unnamed: 3
56,216295,7.4%,
64,315374,10.8%,
65,315374,10.8%,
66,265029,9.1%,
67,265029,9.1%,

0,1
Constant value,0

0,1
Distinct count,73
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,41.863
Minimum,41.659
Maximum,42.01
Zeros (%),0.0%

0,1
Minimum,41.659
5-th percentile,41.681
Q1,41.768
Median,41.891
Q3,41.974
95-th percentile,41.995
Maximum,42.01
Range,0.3513
Interquartile range,0.20546

0,1
Standard deviation,0.10761
Coef of variation,0.0025705
Kurtosis,-1.3085
Mean,41.863
MAD,0.096569
Skewness,-0.29704
Sum,121870000
Variance,0.01158
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
41.974689,297610,10.2%,
41.992478,60758,2.1%,
41.95469,60150,2.1%,
42.008314,59504,2.0%,
41.921965,54548,1.9%,
41.89923,54404,1.9%,
41.986921,52712,1.8%,
41.868077,51476,1.8%,
41.732984,49766,1.7%,
41.740641,48376,1.7%,

Value,Count,Frequency (%),Unnamed: 3
41.659112,45092,1.5%,
41.662014,43732,1.5%,
41.673408,43268,1.5%,
41.680946,18946,0.7%,
41.682587,47780,1.6%,

Value,Count,Frequency (%),Unnamed: 3
41.992478,60758,2.1%,
41.994679,44162,1.5%,
42.008314,59504,2.0%,
42.009876,47604,1.6%,
42.010412,27268,0.9%,

0,1
Distinct count,8718
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,41.897
Minimum,41.714
Maximum,41.998
Zeros (%),0.0%

0,1
Minimum,41.714
5-th percentile,41.733
Q1,41.882
Median,41.927
Q3,41.948
95-th percentile,41.985
Maximum,41.998
Range,0.28388
Interquartile range,0.065333

0,1
Standard deviation,0.078035
Coef of variation,0.0018626
Kurtosis,-0.23698
Mean,41.897
MAD,0.062146
Skewness,-1.0563
Sum,121970000
Variance,0.0060895
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
41.94847,1540,0.1%,
41.9409616667,1388,0.0%,
41.9192216667,1348,0.0%,
41.9517116667,1238,0.0%,
41.9207566667,1086,0.0%,
41.934005,1086,0.0%,
41.9199783333,1086,0.0%,
41.9426366667,1036,0.0%,
41.919505,1036,0.0%,
41.9426283333,1036,0.0%,

Value,Count,Frequency (%),Unnamed: 3
41.713925,334,0.0%,
41.714005,334,0.0%,
41.7140416667,334,0.0%,
41.7141233333,334,0.0%,
41.7142683333,334,0.0%,

Value,Count,Frequency (%),Unnamed: 3
41.9974533333,274,0.0%,
41.9974933333,274,0.0%,
41.99759,274,0.0%,
41.997765,274,0.0%,
41.9978083333,274,0.0%,

0,1
Distinct count,73
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,-87.715
Minimum,-87.931
Maximum,-87.535
Zeros (%),0.0%

0,1
Minimum,-87.931
5-th percentile,-87.891
Q1,-87.777
Median,-87.704
Q3,-87.654
95-th percentile,-87.563
Maximum,-87.535
Range,0.3958
Interquartile range,0.12297

0,1
Standard deviation,0.094483
Coef of variation,-0.0010772
Kurtosis,-0.50169
Mean,-87.715
MAD,0.07563
Skewness,-0.30553
Sum,-255350000
Variance,0.008927
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
-87.890615,297610,10.2%,
-87.862995,60758,2.1%,
-87.800991,60150,2.1%,
-87.777921,59504,2.0%,
-87.632085,54548,1.9%,
-87.716788,54404,1.9%,
-87.689778,52712,1.8%,
-87.666901,51476,1.8%,
-87.649642,49766,1.7%,
-87.546587,48376,1.7%,

Value,Count,Frequency (%),Unnamed: 3
-87.930995,16596,0.6%,
-87.890615,297610,10.2%,
-87.862995,60758,2.1%,
-87.832763,37208,1.3%,
-87.807277,47604,1.6%,

Value,Count,Frequency (%),Unnamed: 3
-87.562889,21108,0.7%,
-87.55551,21108,0.7%,
-87.546587,48376,1.7%,
-87.538693,45092,1.5%,
-87.535198,18946,0.7%,

0,1
Distinct count,8669
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,-87.722
Minimum,-87.818
Maximum,-87.587
Zeros (%),0.0%

0,1
Minimum,-87.818
5-th percentile,-87.803
Q1,-87.759
Median,-87.718
Q3,-87.696
95-th percentile,-87.612
Maximum,-87.587
Range,0.23168
Interquartile range,0.06218

0,1
Standard deviation,0.05078
Coef of variation,-0.00057888
Kurtosis,0.27711
Mean,-87.722
MAD,0.037705
Skewness,0.4014
Sum,-255370000
Variance,0.0025786
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
-87.7053116667,1448,0.0%,
-87.7108833333,1240,0.0%,
-87.7000933333,1086,0.0%,
-87.700105,1086,0.0%,
-87.7064283333,1058,0.0%,
-87.7158216667,1058,0.0%,
-87.5975,1002,0.0%,
-87.706445,998,0.0%,
-87.7064466667,998,0.0%,
-87.715245,998,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-87.8184083333,312,0.0%,
-87.81838,312,0.0%,
-87.8183533333,312,0.0%,
-87.8183316667,312,0.0%,
-87.8183183333,312,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-87.5867916667,334,0.0%,
-87.5867866667,334,0.0%,
-87.586775,334,0.0%,
-87.586755,334,0.0%,
-87.5867266667,334,0.0%,

0,1
Distinct count,50
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,14.255
Minimum,1
Maximum,50
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,1
Q1,3
Median,8
Q3,21
95-th percentile,46
Maximum,50
Range,49
Interquartile range,18

0,1
Standard deviation,13.847
Coef of variation,0.97137
Kurtosis,0.26076
Mean,14.255
MAD,11.265
Skewness,1.1624
Sum,41499176
Variance,191.74
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
3,255916,8.8%,
2,248800,8.5%,
1,228290,7.8%,
4,202746,7.0%,
7,154240,5.3%,
5,145414,5.0%,
6,131928,4.5%,
11,100426,3.4%,
50,96252,3.3%,
8,93440,3.2%,

Value,Count,Frequency (%),Unnamed: 3
1,228290,7.8%,
2,248800,8.5%,
3,255916,8.8%,
4,202746,7.0%,
5,145414,5.0%,

Value,Count,Frequency (%),Unnamed: 3
46,14988,0.5%,
47,18718,0.6%,
48,15732,0.5%,
49,3214,0.1%,
50,96252,3.3%,

0,1
Distinct count,3
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
0.0,2381068
0.23,265029
0.36,265029

Value,Count,Frequency (%),Unnamed: 3
0.0,2381068,81.8%,
0.23,265029,9.1%,
0.36,265029,9.1%,

0,1
Distinct count,5
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,17.6
Minimum,5
Maximum,24
Zeros (%),0.0%

0,1
Minimum,5
5-th percentile,5
Q1,10
Median,22
Q3,23
95-th percentile,24
Maximum,24
Range,19
Interquartile range,13

0,1
Standard deviation,7.4817
Coef of variation,0.42511
Kurtosis,-1.3617
Mean,17.6
MAD,7.0377
Skewness,-0.66426
Sum,51234583
Variance,55.975
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
23,750343,25.8%,
24,681237,23.4%,
10,630748,21.7%,
5,432590,14.9%,
22,416208,14.3%,

Value,Count,Frequency (%),Unnamed: 3
5,432590,14.9%,
10,630748,21.7%,
22,416208,14.3%,
23,750343,25.8%,
24,681237,23.4%,

Value,Count,Frequency (%),Unnamed: 3
5,432590,14.9%,
10,630748,21.7%,
22,416208,14.3%,
23,750343,25.8%,
24,681237,23.4%,

0,1
Distinct count,9
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,4.4715
Minimum,1.5
Maximum,10.5
Zeros (%),0.0%

0,1
Minimum,1.5
5-th percentile,1.5
Q1,2.7
Median,4.2
Q3,4.5
95-th percentile,10.5
Maximum,10.5
Range,9.0
Interquartile range,1.8

0,1
Standard deviation,2.3348
Coef of variation,0.52217
Kurtosis,1.3418
Mean,4.4715
MAD,1.5175
Skewness,1.4085
Sum,13017000
Variance,5.4515
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
4.2,731582,25.1%,
4.5,416208,14.3%,
3.9,315374,10.8%,
1.5,265029,9.1%,
2.5,265029,9.1%,
4.1,242657,8.3%,
2.7,242657,8.3%,
8.6,216295,7.4%,
10.5,216295,7.4%,

Value,Count,Frequency (%),Unnamed: 3
1.5,265029,9.1%,
2.5,265029,9.1%,
2.7,242657,8.3%,
3.9,315374,10.8%,
4.1,242657,8.3%,

Value,Count,Frequency (%),Unnamed: 3
4.1,242657,8.3%,
4.2,731582,25.1%,
4.5,416208,14.3%,
8.6,216295,7.4%,
10.5,216295,7.4%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
30.05,1015372
30.13,832416
30.01,315374
Other values (3),747964

Value,Count,Frequency (%),Unnamed: 3
30.05,1015372,34.9%,
30.13,832416,28.6%,
30.01,315374,10.8%,
30.0,315374,10.8%,
29.98,216295,7.4%,
29.96,216295,7.4%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
0.0,1455563
M,1455563

Value,Count,Frequency (%),Unnamed: 3
0.0,1455563,50.0%,
M,1455563,50.0%,

0,1
Distinct count,5
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
CULEX PIPIENS/RESTUANS,1343138
CULEX PIPIENS,844042
CULEX RESTUANS,714344
Other values (2),9602

Value,Count,Frequency (%),Unnamed: 3
CULEX PIPIENS/RESTUANS,1343138,46.1%,
CULEX PIPIENS,844042,29.0%,
CULEX RESTUANS,714344,24.5%,
CULEX TERRITANS,6428,0.2%,
CULEX SALINARIUS,3174,0.1%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,1.5
Minimum,1
Maximum,2
Zeros (%),0.0%

0,1
Minimum,1.0
5-th percentile,1.0
Q1,1.0
Median,1.5
Q3,2.0
95-th percentile,2.0
Maximum,2.0
Range,1.0
Interquartile range,1.0

0,1
Standard deviation,0.5
Coef of variation,0.33333
Kurtosis,-2
Mean,1.5
MAD,0.5
Skewness,0
Sum,4366689
Variance,0.25
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
2,1455563,50.0%,
1,1455563,50.0%,

Value,Count,Frequency (%),Unnamed: 3
1,1455563,50.0%,
2,1455563,50.0%,

Value,Count,Frequency (%),Unnamed: 3
1,1455563,50.0%,
2,1455563,50.0%,

0,1
Distinct count,8
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
29.36,580403
29.33,458952
29.47,416208
Other values (5),1455563

Value,Count,Frequency (%),Unnamed: 3
29.36,580403,19.9%,
29.33,458952,15.8%,
29.47,416208,14.3%,
29.4,416208,14.3%,
29.29,315374,10.8%,
29.42,265029,9.1%,
29.39,242657,8.3%,
29.26,216295,7.4%,

0,1
Distinct count,68
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
W OHARE AIRPORT,297610
S ASHLAND AVE,97690
N CANNON DR,78562
Other values (65),2437264

Value,Count,Frequency (%),Unnamed: 3
W OHARE AIRPORT,297610,10.2%,
S ASHLAND AVE,97690,3.4%,
N CANNON DR,78562,2.7%,
W 51ST ST,65572,2.3%,
N PULASKI RD,61718,2.1%,
W HIGGINS RD,60758,2.1%,
N OAK PARK AVE,60150,2.1%,
N MOSELL AVE,59504,2.0%,
N CENTRAL PARK DR,54404,1.9%,
S STONY ISLAND AVE,53952,1.9%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
-,1455563
0459,416208
0514,315374
Other values (3),723981

Value,Count,Frequency (%),Unnamed: 3
-,1455563,50.0%,
0459,416208,14.3%,
0514,315374,10.8%,
0506,265029,9.1%,
0438,242657,8.3%,
0452,216295,7.4%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
-,1455563
1854,416208
1832,315374
Other values (3),723981

Value,Count,Frequency (%),Unnamed: 3
-,1455563,50.0%,
1854,416208,14.3%,
1832,315374,10.8%,
1843,265029,9.1%,
1918,242657,8.3%,
1903,216295,7.4%,

0,1
Distinct count,6328
Unique (%),0.2%
Missing (%),0.0%
Missing (n),0

0,1
8:36:55 PM,1370
8:59:25 PM,1370
8:44:35 PM,1370
Other values (6325),2907016

Value,Count,Frequency (%),Unnamed: 3
8:36:55 PM,1370,0.0%,
8:59:25 PM,1370,0.0%,
8:44:35 PM,1370,0.0%,
9:07:45 PM,1370,0.0%,
9:59:05 PM,1370,0.0%,
9:49:25 PM,1370,0.0%,
9:47:15 PM,1370,0.0%,
9:04:05 PM,1370,0.0%,
8:53:55 PM,1370,0.0%,
8:38:55 PM,1370,0.0%,

0,1
Distinct count,7
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,79.74
Minimum,74
Maximum,89
Zeros (%),0.0%

0,1
Minimum,74
5-th percentile,74
Q1,75
Median,78
Q3,81
95-th percentile,89
Maximum,89
Range,15
Interquartile range,6

0,1
Standard deviation,4.8994
Coef of variation,0.061442
Kurtosis,-0.69592
Mean,79.74
MAD,3.9992
Skewness,0.71546
Sum,232134089
Variance,24.004
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
78,746353,25.6%,
81,485314,16.7%,
75,416208,14.3%,
74,416208,14.3%,
89,315374,10.8%,
87,315374,10.8%,
79,216295,7.4%,

Value,Count,Frequency (%),Unnamed: 3
74,416208,14.3%,
75,416208,14.3%,
78,746353,25.6%,
79,216295,7.4%,
81,485314,16.7%,

Value,Count,Frequency (%),Unnamed: 3
78,746353,25.6%,
79,216295,7.4%,
81,485314,16.7%,
87,315374,10.8%,
89,315374,10.8%,

0,1
Distinct count,8
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,62.47
Minimum,55
Maximum,69
Zeros (%),0.0%

0,1
Minimum,55
5-th percentile,55
Q1,57
Median,65
Q3,68
95-th percentile,69
Maximum,69
Range,14
Interquartile range,11

0,1
Standard deviation,5.539
Coef of variation,0.088667
Kurtosis,-1.7714
Mean,62.47
MAD,5.3574
Skewness,-0.13784
Sum,181858786
Variance,30.681
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
57,658865,22.6%,
69,531669,18.3%,
55,416208,14.3%,
66,315374,10.8%,
68,265029,9.1%,
67,265029,9.1%,
58,242657,8.3%,
65,216295,7.4%,

Value,Count,Frequency (%),Unnamed: 3
55,416208,14.3%,
57,658865,22.6%,
58,242657,8.3%,
65,216295,7.4%,
66,315374,10.8%,

Value,Count,Frequency (%),Unnamed: 3
65,216295,7.4%,
66,315374,10.8%,
67,265029,9.1%,
68,265029,9.1%,
69,531669,18.3%,

0,1
Distinct count,73
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
T900,297610
T009,60758
T002,60150
Other values (70),2492608

Value,Count,Frequency (%),Unnamed: 3
T900,297610,10.2%,
T009,60758,2.1%,
T002,60150,2.1%,
T008,59504,2.0%,
T054,54548,1.9%,
T030,54404,1.9%,
T028,52712,1.8%,
T090,51476,1.8%,
T159,49766,1.7%,
T209,48376,1.7%,

0,1
Constant value,M

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
69,895777
58,832416
62,432590
Other values (3),750343

Value,Count,Frequency (%),Unnamed: 3
69,895777,30.8%,
58,832416,28.6%,
62,432590,14.9%,
68,265029,9.1%,
61,242657,8.3%,
60,242657,8.3%,

0,1
Distinct count,2
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.15484
Minimum,0
Maximum,1
Zeros (%),84.5%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,1
Maximum,1
Range,1
Interquartile range,0

0,1
Standard deviation,0.36176
Coef of variation,2.3363
Kurtosis,1.6414
Mean,0.15484
MAD,0.26173
Skewness,1.9082
Sum,450768
Variance,0.13087
Memory size,44.4 MiB

Value,Count,Frequency (%),Unnamed: 3
0,2460358,84.5%,
1,450768,15.5%,

Value,Count,Frequency (%),Unnamed: 3
0,2460358,84.5%,
1,450768,15.5%,

Value,Count,Frequency (%),Unnamed: 3
0,2460358,84.5%,
1,450768,15.5%,

0,1
Constant value,both

Unnamed: 0,Date,Address,Species,Block,Street,Trap,AddressNumberAndStreet,Latitude_x,Longitude_x,AddressAccuracy,NumMosquitos,WnvPresent,Station,Tmax,Tmin,Tavg,Depart,DewPoint,WetBulb,Heat,Cool,Sunrise,Sunset,CodeSum,Depth,Water1,SnowFall,PrecipTotal,StnPressure,SeaLevel,ResultSpeed,ResultDir,AvgSpeed,Time,Latitude_y,Longitude_y,_merge
0,2013-07-25,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX PIPIENS/RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,3,0,1,81,57,69,-5,54,61,0,4,438,1918,,0,M,0.0,0.0,29.33,30.05,4.1,23,5.1,8:51:16 PM,41.96052,-87.739783,both
1,2013-07-25,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX PIPIENS/RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,3,0,1,81,57,69,-5,54,61,0,4,438,1918,,0,M,0.0,0.0,29.33,30.05,4.1,23,5.1,8:51:26 PM,41.960515,-87.739787,both
2,2013-07-25,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX PIPIENS/RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,3,0,1,81,57,69,-5,54,61,0,4,438,1918,,0,M,0.0,0.0,29.33,30.05,4.1,23,5.1,8:51:36 PM,41.960508,-87.739787,both
3,2013-07-25,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX PIPIENS/RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,3,0,1,81,57,69,-5,54,61,0,4,438,1918,,0,M,0.0,0.0,29.33,30.05,4.1,23,5.1,8:51:46 PM,41.960498,-87.739792,both
4,2013-07-25,"4100 North Oak Park Avenue, Chicago, IL 60634,...",CULEX PIPIENS/RESTUANS,41,N OAK PARK AVE,T002,"4100 N OAK PARK AVE, Chicago, IL",41.95469,-87.800991,9,3,0,1,81,57,69,-5,54,61,0,4,438,1918,,0,M,0.0,0.0,29.33,30.05,4.1,23,5.1,8:51:56 PM,41.960527,-87.7398,both


In [48]:
cdf.Heat.unique()
# cdf.dtypes
# df_w.Heat.unique()
df_w.Heat.value_counts()

0     1870
4       88
1       86
2       81
8       67
3       66
5       61
15      57
7       49
12      49
11      48
10      48
13      46
9       46
6       45
14      36
16      29
20      28
19      24
18      24
21      19
17      17
23      15
22      12
M       11
24       7
25       5
26       4
28       2
29       2
27       2
Name: Heat, dtype: int64