# Week 11 In-Class Challenge

This week, we are doing an in-class exercise.  This will be worth 5 extra credit points for each team that creates a successful solution that follows the programming guidelines we've established this semester.  All the requirements for this programming challenge are described below.  If you complete them all successfully, you will receive 5 points.  If you do not, you will receive 0 points.

Work as a group.  You will all receive the same number of points.

## Requirements
1. Your code must be a function named `week11()` that takes no parameters
2. Your `week11()` function must read this CSV from the internet and use it as input: https://hds5210-data.s3.amazonaws.com/Section111ValidICD10-Jan2024.csv
  * This file has three columns: CODE, SHORT DESCRIPTION, LONG DESCRIPTION, and NF EXCL
  * The NF EXCL indicates that this code is excluded from a "no fault" list related to workers compensation insurance claims
3. Your `week11()` function must use Pandas functions to generate new columns and filter the dataframe using the following rules
   * Create a new column called "CODE TYPE" that contains only the first character of the CODE column. For example if CODE="A001" then CODE TYPE="A"
   * Create a new column called "CODE NUM" that contains only the numeric part of the CODE column and make it numeric. For example if CODE="A001" then CODE NUM=1
   * Some CODE NUM portions cannot be converted directly because the have an "X" in them.  Convert that "X" to a "." and then conver the CODE NUM to a numeric value.  For example if CODE="E1037X1" then CODE NUM=1037.1
   * Filter your results to only include those rows where NF EXCL="Y"
   * Sort your results in ascending order by CODE NUM and then by CODE TYPE
4. Use the "checker" in the last cell to confirm that your results are correct.  If the checker gives any errors, you will receive no credit.


## Submitting
Submit the assignment by creating a folder called week11_inclass in the repository of one of the people in your group. At the top of your submission, enter the names of all of the people in your group..

## Scoring Rubric
If your code passes my checker included at the bottom of this page, each person on your team will earn 5 points.  If you code does not pass my checker, you will earn 0 points. This is "all or nothing" extra credit.

---

**This challenge was submitted by**

Chandra sekhar Ponugumati(bannar_id:001392282)

Venkatesh Paturu (bannar_id:001305109)

In [94]:
import pandas as pd

hospitals = pd.read_csv('https://hds5210-data.s3.amazonaws.com/Section111ValidICD10-Jan2024.csv')

hospitals.shape


(21703, 4)

In [96]:
hospitals['CODE TYPE'] = hospitals['CODE'].str[0]

hospitals.head()

Unnamed: 0,CODE,SHORT DESCRIPTION,LONG DESCRIPTION,NF EXCL,CODE TYPE
0,A000,"Cholera due to Vibrio cholerae 01, biovar chol...","Cholera due to Vibrio cholerae 01, biovar chol...",,A
1,B000,Eczema herpeticum,Eczema herpeticum,,B
2,C000,Malignant neoplasm of external upper lip,Malignant neoplasm of external upper lip,,C
3,D0000,"Carcinoma in situ of oral cavity, unspecified ...","Carcinoma in situ of oral cavity, unspecified ...",,D
4,E000,"Congenital iodine-deficiency syndrome, neurolo...","Congenital iodine-deficiency syndrome, neurolo...",,E


In [97]:
hospitals['CODE NUM'] = pd.to_numeric(
    hospitals['CODE'].str[1:].str.replace("X", "."), errors='coerce'
)

hospitals.head()

Unnamed: 0,CODE,SHORT DESCRIPTION,LONG DESCRIPTION,NF EXCL,CODE TYPE,CODE NUM
0,A000,"Cholera due to Vibrio cholerae 01, biovar chol...","Cholera due to Vibrio cholerae 01, biovar chol...",,A,0
1,B000,Eczema herpeticum,Eczema herpeticum,,B,0
2,C000,Malignant neoplasm of external upper lip,Malignant neoplasm of external upper lip,,C,0
3,D0000,"Carcinoma in situ of oral cavity, unspecified ...","Carcinoma in situ of oral cavity, unspecified ...",,D,0
4,E000,"Congenital iodine-deficiency syndrome, neurolo...","Congenital iodine-deficiency syndrome, neurolo...",,E,0


In [98]:
hospitals_filtered = hospitals[hospitals['NF EXCL'] == 'Y']

hospitals_filtered = hospitals_filtered.sort_values(by=['CODE NUM', 'CODE TYPE']).reset_index(drop=True)

hospitals_filtered.shape

(1098, 6)

In [127]:
# I've provide you code here to start with.

import pandas as pd

def week11():
    """() -> pd.DataFrame

    This function will process the file named in step 2 of the instructions above
    using the rules in step 3 above.  It will return a dataframe that contains
    the filtered, sorted, and enhanced results.

    For my tests, I will validate the shape to start with.
    If I have more time, I can figure out how to write tests for the other requirements.

    >>> final_data.shape
    (1098, 6)
    """

    hospitals = pd.read_csv('https://hds5210-data.s3.amazonaws.com/Section111ValidICD10-Jan2024.csv')

    hospitals['CODE TYPE'] = hospitals['CODE'].str[0]

    def convert_code_num(code):
      """
      this function replaces X in the CODE with '.' which can be used to read by numeric function as a float value
      """
    # Remove the first character and replace "X" with "."
      numeric_part = code[1:].replace("X", ".")
      try:
        # Convert to float if possible
             return float(numeric_part)
      except ValueError:
        # Return None if conversion fails (optional)
              return None

    # Apply the conversion function to create "CODE NUM" column
    hospitals['CODE NUM'] = hospitals['CODE'].apply(convert_code_num)

    # Step 3: Filter rows where "NF EXCL" is "Y"
    data = hospitals[hospitals['NF EXCL'] == 'Y']

    # Step 4: Sort by "CODE NUM" in ascending order, then by "CODE TYPE"
    final_data = data.sort_values(by=['CODE NUM', 'CODE TYPE']).reset_index(drop=True)

    return final_data

# Run the week11 function and print results
final_data = week11()
print(final_data)


         CODE                                  SHORT DESCRIPTION  \
0         E02       Subclinical iodine-deficiency hypothyroidism   
1         Y09                       Assault by unspecified means   
2         I10                   Essential (primary) hypertension   
3        E018  Oth iodine-deficiency related thyroid disord a...   
4         K23  Disorders of esophagus in diseases classified ...   
...       ...                                                ...   
1093  E133559  Oth diabetes with stable prolif diabetic retin...   
1094  E133591  Oth diab with prolif diab rtnop without macula...   
1095  E133592  Oth diab with prolif diab rtnop without macula...   
1096  E133593  Oth diab with prolif diab rtnop without macula...   
1097  E133599  Oth diab with prolif diab rtnop without macula...   

                                       LONG DESCRIPTION NF EXCL CODE TYPE  \
0     Subclinical iodine-deficiency hypothyroidism  ...       Y         E   
1                          As

---

## You can run your doctests this way

In [125]:
from doctest import run_docstring_examples
run_docstring_examples(week11, globs=globals(), verbose=True)

Finding tests in NoName
Trying:
    final_data.shape
Expecting:
    (1098, 6)
ok


---

## Use this code to check your output!

If you get something other than `"You did it!!"` then you still have work to do on your solution.

The feedback provided should give you some hints as to what you haven't done correctly in filtering and organizing the data.

You can run this as many times as you want.  I'm not recording who is trying what and if you're getting the right answers or not.

In [88]:
import requests

r = requests.post('https://rln3ys6dciybh6cydvapszesna0oxcyn.lambda-url.us-east-1.on.aws/',
                  headers={"content-type": "application/json"},
                  data=week11().to_json(orient='records'))

print(r.status_code)
print(r.text)

AttributeError: 'NoneType' object has no attribute 'to_json'