# Assignment 1 (due Friday November 8, 10 PM MT)

## Instructions - Read this first!

This is an individual homework assignment. This means that:
* You may discuss the problems in this assignment with other students in this course and your instructor/TA, but YOUR WORK MUST BE YOUR OWN.
* Do not show other students code or your own work on this assignment.
* You may consult external references, but not actively receive help from individuals not involved in this course. 
* Cite all references outside of the course you used, including conversations with other students which were helpful. (This helps us give credit where it is due!). All references must use a commonly accepted reference format, for example, APA or IEEE (or another citation style of your choice).

## Submission instructions

Your submission will be graded manually. To ensure that everything goes smoothly, please follow these instructions to prepare your notebook for submission to the D2L Dropbox for Assignment 1:

* Please remove any print statments used to test your work (you can comment them out)
* Please provide your solutions where asked; please do not alter any other parts of this notebook.
* If you need to add cells to test your code please move them to the end of the notebook before submission- or you may included your commented out answers and tests in the cells provided.

### Specific instructions for Use of Generative AI (Co-Pilot, ChatGPT, and others)

* You may not copy output from any of these tools directly into your assignment. You may use these tools to help you brainstorm or understand what steps you need to solve a provided problem 
* If you used generative AI  (for example ChatGPT or GitHub Co-Pilot) in this assignment, you must mark where it was used, and which product you used. **You must include a Markdown cell with a full listing of all prompts you used.**

If any of these rules seem ambiguous, please check your instructor or with Leanne Wu (lewu@ucalgary.ca) for help interpreting them.

## Introduction


In this assignment, we will focus on familiarizing you with using Python to work with files and JSON objects. You have been provided with a JSON file, `calgary_healthcare.json`, which features data extracted from the [Open Database of Healthcare Facilities](https://www.statcan.gc.ca/en/lode/databases/odhf), made available by Statistics Canada under the [Open Government License - Canada](https://open.canada.ca/en/open-government-licence-canada). 


In [226]:
# you may not import other libraries besides what is provided below.
import csv, json
import pandas as pd


## Part A: Warm-up questions (10 marks)

**Question 1 (2 marks)**

In the cell below, open the json file that has been provided to you (`calgary_healthcare.json`). Using only Python built-ins, count how many lines this file contains.

In [228]:
with open("calgary_healthcare.json") as my_json:
    healthcare = my_json.readlines()
    print(len(healthcare))

# Answer = There are 38 lines in the code including spaces lines.

38


**Question 2 (1 mark)**

Use a for loop and the `readlines()` method to read each line of the file and print it out.

In [16]:
with open("calgary_healthcare.json") as my_json:
    healthcare = my_json.readlines()
    for i in range(len(healthcare)):
        print(healthcare[i])
    

{"Facilities":[

	{

		"Hospitals":[

		{

			"Name": "Alberta Children's Hospital",

			"Specialty":"Pediatric",

			"Address":"28 Oki Dr. NW"

		}, 

		{

			"Name": "Peter Lougheed Centre", 

			"Address": "3500 - 26th ave. n.e. calgary ab t1y 6j4",

			"Nearby":[{"Name": "Sunridge Medical Gallery - Mental Health"}, {"Name": "Sunridge Medical Gallery - General Ambulatory"}]

		}, 

		{

			"Name": "Foothills Medical Centre", 

			"Nearby":"Tom Baker Cancer Centre", 

			"Address":"1403 - 29th st. n.w. calgary ab t2n 2t9"

		},

		{

			"Name": "Rockyview General Hospital", 

			"Address":"7007 - 14th st. s.w. calgary ab t2v 1p9"

		},

		

		{"Name": "South Health Campus"}

	]}, 



	{"Nursing and residential care facilities": 

	{

		"Company": "Intercare Corporate Group", 

		"Locations": [

		{"Name": "Chinook Care Centre"},

		{"Name": "Southwood Care Centre"}, 

		{"Name": "Brentwood Care Centre"}

		]

	}

	}

 ]

}


**Question 3 (2 marks)**

Use the json library to read the contents of the file into a JSON object. Output the contents of this object.

In [23]:
with open("calgary_healthcare.json") as my_json:
    healthcare = json.load(my_json)
healthcare

{'Facilities': [{'Hospitals': [{'Name': "Alberta Children's Hospital",
     'Specialty': 'Pediatric',
     'Address': '28 Oki Dr. NW'},
    {'Name': 'Peter Lougheed Centre',
     'Address': '3500 - 26th ave. n.e. calgary ab t1y 6j4',
     'Nearby': [{'Name': 'Sunridge Medical Gallery - Mental Health'},
      {'Name': 'Sunridge Medical Gallery - General Ambulatory'}]},
    {'Name': 'Foothills Medical Centre',
     'Nearby': 'Tom Baker Cancer Centre',
     'Address': '1403 - 29th st. n.w. calgary ab t2n 2t9'},
    {'Name': 'Rockyview General Hospital',
     'Address': '7007 - 14th st. s.w. calgary ab t2v 1p9'},
    {'Name': 'South Health Campus'}]},
  {'Nursing and residential care facilities': {'Company': 'Intercare Corporate Group',
    'Locations': [{'Name': 'Chinook Care Centre'},
     {'Name': 'Southwood Care Centre'},
     {'Name': 'Brentwood Care Centre'}]}}]}

**Question 4 (5 marks)**

Using the json library again, read the contents of the file into a JSON object, and then access the data inside this object however most makes sense, using Python built-ins, or by processing your JSON object by running the results through json library methods a second time. List all names of nearby facilitiesb in the provided file.

In [44]:
# Extract the names of all nearby facilities given any file
[hospital['Nearby'] 
    for facility in healthcare['Facilities'] 
    if 'Hospitals' in facility
    for hospital in facility['Hospitals'] 
    if 'Nearby' in hospital
]

[[{'Name': 'Sunridge Medical Gallery - Mental Health'},
  {'Name': 'Sunridge Medical Gallery - General Ambulatory'}],
 'Tom Baker Cancer Centre']

In [45]:
# or can be done via indexing 
print(healthcare['Facilities'][0]['Hospitals'][1]['Nearby'][1]["Name"])
print(healthcare['Facilities'][0]['Hospitals'][1]['Nearby'][0]["Name"])
print(healthcare['Facilities'][0]['Hospitals'][2]['Nearby'])

Sunridge Medical Gallery - General Ambulatory
Sunridge Medical Gallery - Mental Health
Tom Baker Cancer Centre


## Part B: Understanding our data (12 marks)

**Question 1 (8 marks)**

First, let's take a look at the contents of the JSON file you have been provided. Imagine that we will need a CSV for each element of the file which contains data. In the Markdown cell below, identify what attributes each element should have.

**Facilities**

* The Type of Facilities: Which would be either "Hospital" or "Nursing and Residential Care"
* Name: Facility Name.
* Address: The location of the facility.
* Specialty: The specialty area (If Applicable)

**Hospital**
* Name: The Name of the Hospital 
* Specialty: The specialty area (If Applicable)
* Address: The location of the facility. 
* Nearby: A list of nearby facilities (If applicable)

**NursingAndResidentialCare**
* Location Name = The name of the location of the Nursing And Residential Care
* Company Name = The name of the company managing the facility
* Address = The location of the facility.

**Company**
* Location = The location of the facility.
* Type of Facilities = What kind of facility it is 

**Question 2 (1 mark)**

Are there any elements that are missing from the list above?

**Answer** Yes, Nearby Hospitals should probably be in the list of all the locations and their nearby facilities.

**Question 3 (3 marks)**

Of the four elements identified at the start of Part B, which is the most difficult to read from our file, and why?

**Answer**
Probably "Neaby" hospitals, and the "location" of the companys. This is because "Nearby" is nested within each "Hospital" entry, which itself is nested inside "Facilities." This requires me looking into each level of access, making it diffult to flatten consistently with a single operation. "Nearby" also has varying formats across entries—it may contain a list of dictionaries (e.g., names of nearby facilities) or just a single string. This inconsistency complicates data extraction, as handling each format requires different approaches. So it may take a little bit to figure out how to pull it correctly from the file. This is also similar to "Location" for the companies.

## Part C: Converting from JSON to CSV (12 marks)

Write your own code, using Python built-ins, the `csv` and `json` modules, to convert the data in the provided file into appropriate well-formed CSVs for each of your identified elements from Part B, Question 1. You may use as many CSVs as you like.

Remember that well-formed CSVs contain one row of data for each item which is part of the CSV, in which each field is separated by commas, ends with a newline (`\n`) and **contains only a single primitive value (such as a number, a string, or a boolean value)**

You MAY add additional fields where necessary.

Each row of each of your CSVs should contain sufficient information so that they could be used in another program independently of each other. For example, you may be able to write another program which could read your CSV file about hospitals to extract addresses to map to a visualization.

_Hint: Consider some tasks which somebody might use the provided data for, and consider what groupings of information might be the most useful, including relationships between difference pieces of data (they may include what you did in Part B above, or there may be other ways to organize your information usefully). Then, ensure that the CSVs you design include as much of the data from the provided JSON as possible._

**Facilities  Table**

In [39]:

HospitalData = [
    { 
        # Creating a dictonary to input values into 
        'Facility Type': 'Hospital',
        # it will get ie. names of the hospital and address and input here 
        'Hospital Name': hospital.get('Name'),
        'Address': hospital.get('Address', "NaN"),
        'Specialty': hospital.get('Specialty', "NaN")
        # Join the nearby facilties with a "," only if it has a list of 
    }
    
    for facility in healthcare['Facilities'] if 'Hospitals' in facility
    for hospital in facility['Hospitals']
]

# List to hold nursing facility data
NursingData = [
    {
        'Facility Type': 'Nursing and residential care facilities',
        'Hospital Name': location['Name'],
    }
    for facility in healthcare['Facilities']
    if 'Nursing and residential care facilities' in facility
    for location in facility['Nursing and residential care facilities'].get('Locations')
]

# Display the resulting DataFrame by merging the two data sets.
FacilitiesData = pd.merge(pd.DataFrame(NursingData),pd.DataFrame(HospitalData), on = ["Facility Type", "Hospital Name"] , how = "outer")
FacilitiesData

# Cite = https://stackoverflow.com/questions/33311258/python-check-if-variable-isinstance-of-any-type-in-list

Unnamed: 0,Facility Type,Hospital Name,Address,Specialty
0,Hospital,Alberta Children's Hospital,28 Oki Dr. NW,Pediatric
1,Hospital,Foothills Medical Centre,1403 - 29th st. n.w. calgary ab t2n 2t9,
2,Hospital,Peter Lougheed Centre,3500 - 26th ave. n.e. calgary ab t1y 6j4,
3,Hospital,Rockyview General Hospital,7007 - 14th st. s.w. calgary ab t2v 1p9,
4,Hospital,South Health Campus,,
5,Nursing and residential care facilities,Brentwood Care Centre,,
6,Nursing and residential care facilities,Chinook Care Centre,,
7,Nursing and residential care facilities,Southwood Care Centre,,


**Hospital Table**

In [47]:

HospitalData = [
    { 
        # Creating a dictonary to input values into 
        'Facility Type': 'Hospital',
        # it will get ie. names of the hospital and address and input here 
        'Hospital Name': hospital.get('Name'),
        'Address': hospital.get('Address'),
        # Join the nearby facilties with a "," only if it has a list of 
        'Nearby Facilities': '; '.join([nearby['Name'] for nearby in hospital['Nearby']]) 
                             if isinstance(hospital.get('Nearby'), list) 
                             else hospital.get('Nearby', None),

        'Specialty': hospital.get('Specialty')
    }
    for facility in healthcare['Facilities'] if 'Hospitals' in facility
    for hospital in facility['Hospitals']
]

# Create a DataFrame from the list of dictionaries
HospitalData_df = pd.DataFrame(HospitalData)

# Display the resulting DataFrame
HospitalData_df

# Cite = https://stackoverflow.com/questions/33311258/python-check-if-variable-isinstance-of-any-type-in-list

Unnamed: 0,Facility Type,Hospital Name,Address,Nearby Facilities,Specialty
0,Hospital,Alberta Children's Hospital,28 Oki Dr. NW,,Pediatric
1,Hospital,Peter Lougheed Centre,3500 - 26th ave. n.e. calgary ab t1y 6j4,Sunridge Medical Gallery - Mental Health; Sunr...,
2,Hospital,Foothills Medical Centre,1403 - 29th st. n.w. calgary ab t2n 2t9,Tom Baker Cancer Centre,
3,Hospital,Rockyview General Hospital,7007 - 14th st. s.w. calgary ab t2v 1p9,,
4,Hospital,South Health Campus,,,


**Nursing And ResidentialCare and Company Table**

In [46]:


# List to hold nursing facility data
FacilityData = [
    {
        'Facility Type': 'Nursing and residential care facilities',
        'Hospital Name': location['Name'],
        'Company': facility['Nursing and residential care facilities'].get('Company', None)
    }
    for facility in healthcare['Facilities']
    if 'Nursing and residential care facilities' in facility
    for location in facility['Nursing and residential care facilities'].get('Locations')
]

# Convert to a DataFrame
FacilityData_df = pd.DataFrame(FacilityData)
FacilityData_df

Unnamed: 0,Facility Type,Hospital Name,Company
0,Nursing and residential care facilities,Chinook Care Centre,Intercare Corporate Group
1,Nursing and residential care facilities,Southwood Care Centre,Intercare Corporate Group
2,Nursing and residential care facilities,Brentwood Care Centre,Intercare Corporate Group


### Part D: Reflection (6 marks)

Consider the following questions, and answer them in the cell below. 

- Consider your work in this assignment, and especially in Parts B and C. Consider the work you did to understand the structure of the JSON, as well as in converting the JSON file into a set of one or more CSVs. Describe a task you completed which you feel was the most uncomfortable part of the assignment.

- Why was this part the most uncomfortable? Describe any gaps you can identify in your current skills, knowledge or practice which this task exposed.

- Consider what steps you might have needed to complete this assignment using pandas, and the few methods you have seen in pandas for working with JSON, either in class, or in the quiz. Do you think this would have helped you understand the format of JSON files better? Why or why not?





 **Answer**
 
 I felt like the most uncomfortable part of the assignment was opening up nested JSON files in a consistent way. I could have indexed given that each value in the table had the same structure, but, for example, since some entries like "nearby hospitals" had their own dictionaries, it was hard to stay consistent and I had to find alternative methods. I think it was uncomfortable for me as I have never really worked with JSON files before, and most of the time data I work with is saved in CSV files. This assignment really exposed me to JSON and its complexities.

 The reason this part was challenging is that I had gaps in my understaning on how to  navigating and manipulating nested structures, especially in JSON format. I realized that I need to build more skills in handling non-tabular data, particularly when it comes to extracting and transforming nested elements into more familiar formats.

 Using pandas for this assignment was helpful but had its limitations for understanding JSON structures deeply. Pandas provides some useful methods to handle JSON data, like json_normalize, which can simplify the process of flattening out nested data. However, these methods only go so far when the JSON structure is complex or deeply nested. While pandas made certain parts of the process smoother, i really took long ways to do a lot of the work and still feel like i can simplify this, which i will take on and continue to learn about.