# Chi-Square and Correlation Analysis
The primary focus is using Chi-Square Analysis to determine if there is a significant association between 
certain socio-demographic variables and the likelihood of individuals having asthma.

In [None]:
from google.colab import auth
from google.cloud import bigquery
from google.colab import data_table
The primary focus is using Chi-Square Analysis to determine if there is a significant association between 
certain socio-demographic variables and the likelihood of individuals having asthma.
project = 'capstone-400517' # Project ID inserted based on the query results selected to explore
location = 'US' # Location inserted based on the query results selected to explore
client = bigquery.Client(project=project, location=location)
data_table.enable_dataframe_formatter()
auth.authenticate_user()

## Reference SQL syntax from the original job
Use the ```jobs.query```
[method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query) to
return the SQL syntax from the job. This can be copied from the output cell
below to edit the query now or in the future. Alternatively, you can use
[this link](https://console.cloud.google.com/bigquery?j=capstone-400517:US:bquxjob_5daa4b90_18b3f35ac57)
back to BigQuery to edit the query within the BigQuery user interface.

In [None]:
# Running this code will display the query used to generate your previous job

job = client.get_job('bquxjob_5daa4b90_18b3f35ac57') # Job ID inserted based on the query results selected to explore
print(job.query)

SELECT * FROM `capstone-400517.capstone2.table_core`


## Result set loaded from BigQuery job as a DataFrame
Query results are referenced from the Job ID ran from BigQuery and the query
does not need to be re-run to explore results. The ```to_dataframe```
[method](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe)
downloads the results to a Pandas DataFrame by using the BigQuery Storage API.

To edit query syntax, you can do so from the BigQuery SQL editor or in the
```Optional:``` sections below.

In [None]:
# Running this code will read results from your previous job

job = client.get_job('bquxjob_5daa4b90_18b3f35ac57') # Job ID inserted based on the query results selected to explore
results = job.to_dataframe()
results.head(10)

## Chi-Square Testing
The primary focus is using Chi-Square Analysis to determine if there is a significant association between 
certain socio-demographic variables and the likelihood of individuals having asthma.

Use the ```chi2_contingency``` from the ```scipy.stats``` to perform Chi-Square analysis. Each of the socio-demographic variables (i.e. Gender, Hospital Region, and Income Level) will be used to compare to the prevelance of Asthma. The chi-square statstics and p-values are provided for each test.

### Hospital Region and Asthma

In [None]:
import pandas as pd

contingency_table = pd.crosstab(results['Asthma'], results['HOSP_REGION'])
print(contingency_table)

HOSP_REGION       1       2        3       4
Asthma                                      
0            463696  645687  1116237  356591
1             12532    9937    17305    6150
2               379     296      628     133
3             34082   40092    63665   19396


In [None]:
from scipy.stats import chi2_contingency

chi2, p, _, _ = chi2_contingency(contingency_table)

print(f"Chi2 value: {chi2}")
print(f"P-value: {p}")


Chi2 value: 4275.206247305069
P-value: 0.0


### Gender and Asthma

In [None]:
import pandas as pd

contingency_table = pd.crosstab(results['Asthma'], results['FEMALE'])
print(contingency_table)

FEMALE        0        1
Asthma                  
0       1248246  1333965
1         27109    18815
2           854      582
3         75205    82030


In [None]:
from scipy.stats import chi2_contingency

chi2, p, _, _ = chi2_contingency(contingency_table)

print(f"Chi2 value: {chi2}")
print(f"P-value: {p}")

Chi2 value: 2162.5728298135064
P-value: 0.0


### Income and Asthma

In [None]:
import pandas as pd

contingency_table = pd.crosstab(results['Asthma'], results['ZIPINC_QRTL'])
print(contingency_table)

ZIPINC_QRTL       1       2       3       4    A    nan
Asthma                                                 
0            804022  634510  617432  499721  112  26414
1             18787   10317    9572    6951    2    295
2               605     334     274     208    0     15
3             55433   38216   35208   26796    1   1581


In [None]:
chi2, p, _, _ = chi2_contingency(contingency_table)

print(f"Chi2 value: {chi2}")
print(f"P-value: {p}")

Chi2 value: 3454.5200195700445
P-value: 0.0
