<a href="https://colab.research.google.com/github/Amr-Abdulla/CountyHealthData/blob/main/Amr_county_health_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **How To Manage Data Easily**
**Introduction**

This data will be on the CountyHealthData 2014-2015.This will, therefore, involve dealing and analyzing the data. Normally, `.CSVs` are the most common format for any structured data to be stashed up and shared. Python will easily do these, especially with powerful libraries in line, such as `pandas` and `numpy`.

The following tutorial will walk you through how to load a `.CSV` into a `pandas `DataFrame, explore its structure, filter data, and extract meaningful insights. This tutorial is targeted at both beginners and advanced data enthusiasts who seek to effectively manage data and enhance their analytical workflow.

   

1. Import Required Libraries:
Ensure you import all necessary libraries for your work:
Instructions to Manage Data Using a CSV File
**Prepare Your Environment:**
*     Ensure you have Python installed with necessary libraries: `numpy` and
`pandas`.
*   Save your CSV file, e.g.,` CountyHealthData_2014-2015 (1).csv`, in a directory
you can access.
2. **Set Up Your Workspace**: Import required libraries and check your working directory:


In [2]:
import os
from google.colab import drive

In [1]:
import numpy as np
import pandas as pd

# Creating A Data Subset



3. **Load the Dataset:**

Load the .`CSV` file into a DataFrame. Ensure the path to your file is correct:



In [3]:
df=pd.read_csv('gdrive/CountyHealthData_2014-2015 (1).csv')

4. **Inspect Dataset Size:**

Check the total number of elements and verify if the dimensions match:



In [4]:
df.size

390976

In [12]:
df.size == 6109 * 64

True

5. **Explore Column Names and Data Types:**

View column headers and the data types of each column:

In [11]:
df.columns

Index(['State', 'Region', 'Division', 'County', 'FIPS', 'GEOID', 'SMS Region',
       'Year', 'Premature death', 'Poor or fair health',
       'Poor physical health days', 'Poor mental health days',
       'Low birthweight', 'Adult smoking', 'Adult obesity',
       'Food environment index', 'Physical inactivity',
       'Access to exercise opportunities', 'Excessive drinking',
       'Alcohol-impaired driving deaths', 'Sexually transmitted infections',
       'Teen births', 'Uninsured', 'Primary care physicians', 'Dentists',
       'Mental health providers', 'Preventable hospital stays',
       'Diabetic screening', 'Mammography screening', 'High school graduation',
       'Some college', 'Unemployment', 'Children in poverty',
       'Income inequality', 'Children in single-parent households',
       'Social associations', 'Violent crime', 'Injury deaths',
       'Air pollution - particulate matter', 'Drinking water violations',
       'Severe housing problems', 'Driving alone to work'

In [None]:
df.dtypes

Unnamed: 0,0
State,object
Region,object
Division,object
County,object
FIPS,int64
...,...
Other primary care providers,float64
Median household income,int64
Children eligible for free lunch,float64
Homicide rate,float64


6. **Inspect Data Sample:**

View a random sample of 5 rows from the DataFrame:


In [10]:
df.sample(n=5)

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
1773,KS,Midwest,West North Central,Clay County,20027,20027,Insuff Data,1/1/2015,6451.0,0.099,...,,0.157,0.067,9687.0,0.097,95.0,48178,0.274,,
2043,KY,South,East South Central,Hart County,21099,21099,Region 5,1/1/2014,7807.0,0.251,...,8.67,0.239,0.068,10848.0,0.198,38.0,31387,0.543,,0.233
1849,KS,Midwest,West North Central,Lyon County,20111,20111,Insuff Data,1/1/2014,6018.0,0.127,...,4.05,0.236,0.085,8415.0,0.104,39.0,39195,0.518,,0.201
1133,IA,Midwest,West North Central,Des Moines County,19057,19057,Insuff Data,1/1/2015,8326.0,0.158,...,8.87,0.114,0.033,8419.0,0.122,52.0,42882,0.445,,
5228,TX,South,West South Central,Montgomery County,48339,48339,Region 10,1/1/2015,6357.0,0.151,...,12.71,0.255,0.123,11650.0,0.144,43.0,69317,0.368,3.6,


7. **Analyze the `Region` Column:**

*   Check the shape and data type of the `Region` column:

In [9]:
print(df.Region.shape)
print(df.Region.dtypes)

(6109,)
object


In [8]:
df.Region.shape

(6109,)

In [7]:
print(df.Region.dtypes)

object



*   Verify the frequency of each data type in the DataFrame:


In [6]:
df.dtypes.value_counts()

Unnamed: 0,count
float64,54
object,6
int64,4


8. **State-Specific Analysis:**

Analyze data filtered by a specific state, e.g., "NC" (North Carolina):

*   Count the occurrences of each state:

In [5]:
df.State.value_counts()

Unnamed: 0_level_0,count
State,Unnamed: 1_level_1
TX,469
GA,318
VA,266
KY,240
MO,229
IL,204
NC,200
KS,199
IA,198
TN,190




*   Filter rows where the state is "NC":

In [17]:
df[df["State"] == "NC"]


Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
3243,NC,South,South Atlantic,Alamance County,37001,37001,Region 20,1/1/2014,7123.0,0.192,...,10.48,0.259,0.073,8640.0,0.167,46.0,41394,0.444,4.94,0.202
3244,NC,South,South Atlantic,Alamance County,37001,37001,Region 20,1/1/2015,7291.0,0.192,...,12.38,0.249,0.088,9050.0,0.167,56.0,43001,0.455,4.60,
3245,NC,South,South Atlantic,Alexander County,37003,37003,Region 20,1/1/2014,7974.0,0.178,...,22.74,0.240,0.077,9316.0,0.205,30.0,39655,0.417,6.27,0.273
3246,NC,South,South Atlantic,Alexander County,37003,37003,Region 20,1/1/2015,8079.0,0.178,...,24.04,0.239,0.076,9242.0,0.205,32.0,46064,0.449,7.20,
3247,NC,South,South Atlantic,Alleghany County,37005,37005,Insuff Data,1/1/2014,8817.0,0.234,...,18.18,0.320,0.131,9585.0,0.210,55.0,34046,0.523,,0.215
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3438,NC,South,South Atlantic,Wilson County,37195,37195,Region 20,1/1/2015,8028.0,0.159,...,7.31,0.262,0.079,9450.0,0.107,77.0,40772,0.556,9.60,
3439,NC,South,South Atlantic,Yadkin County,37197,37197,Region 20,1/1/2014,7893.0,0.207,...,18.45,0.252,0.097,10084.0,0.158,32.0,40012,0.422,3.76,0.241
3440,NC,South,South Atlantic,Yadkin County,37197,37197,Region 20,1/1/2015,7258.0,0.207,...,20.21,0.242,0.094,10998.0,0.158,32.0,40998,0.455,,
3441,NC,South,South Atlantic,Yancey County,37199,37199,Region 15,1/1/2014,6872.0,0.193,...,20.79,0.268,0.110,7707.0,0.158,79.0,36019,0.477,,0.176


9. **Focus on Specific Columns for NC:**

Extract relevant data for counties in North Carolina:



*  County and adult smoking rate:

In [16]:
df.loc[df["State"] == "NC", ["County", "Adult smoking"]].head()


Unnamed: 0,County,Adult smoking
3243,Alamance County,0.238
3244,Alamance County,0.238
3245,Alexander County,0.26
3246,Alexander County,0.26
3247,Alleghany County,0.271



*   County and adult obesity rate:

In [15]:
df.loc[df["State"] == "NC", ["County", "Adult obesity"]].head()


Unnamed: 0,County,Adult obesity
3243,Alamance County,0.341
3244,Alamance County,0.332
3245,Alexander County,0.272
3246,Alexander County,0.283
3247,Alleghany County,0.247



*   County and excessive drinking rate:

In [14]:
df.loc[df["State"] == "NC", ["County", "Excessive drinking"]].head()


Unnamed: 0,County,Excessive drinking
3243,Alamance County,0.123
3244,Alamance County,0.123
3245,Alexander County,0.119
3246,Alexander County,0.119
3247,Alleghany County,0.165



*   County and poor mental health days:

In [13]:
df.loc[df["State"] == "NC", ["County", "Poor mental health days"]].head()


Unnamed: 0,County,Poor mental health days
3243,Alamance County,3.6
3244,Alamance County,3.6
3245,Alexander County,4.6
3246,Alexander County,4.6
3247,Alleghany County,4.4


# Exporting Data
1. **Export the Entire DataFrame**

To export the entire DataFrame `df` to a CSV file:




In [24]:
# Export the entire DataFrame to a CSV file
df.to_csv('output_data.csv', index=False)




*   `index=False`: Prevents pandas from writing row indices.

*  `'output_data.csv'`: The name of the CSV file that will be created. You can change the file name and path as needed.




2. **Check the Exported File**

After exporting, check the directory where you saved the file to ensure it has been created. If you're working with Colab or a local environment, make sure to specify the correct file path.

# **Conclusion**

Following the example above, you can load, explore, and analyze your `.CSV` data efficiently using Python. You can start right from loading your data into a pandas DataFrame to different operations like looking into the structure of your data, filtering for certain conditions, or analyzing key statistics. Besides that, you may save the processed or filtered data for future use to keep your work organized and reproducible.

This approach provides a strong framework towards handling `.CSV` files for deeper insight into your dataset with minimal effort. The mentioned methods make data analysis quite straightforward and effective, whether working locally or deploying on cloud platforms. Happy coding!