# Week 11 In-Class Challenge

This week, we are doing an in-class exercise.  This will be worth 5 extra credit points for each team that creates a successful solution that follows the programming guidelines we've established this semester.  All the requirements for this programming challenge are described below.  If you complete them all successfully, you will receive 5 points.  If you do not, you will receive 0 points.

Work as a group.  You will all receive the same number of points.

## Requirements
1. Your code must be a function named `week11()` that takes no parameters
2. Your `week11()` function must read this CSV from the internet and use it as input: https://hds5210-data.s3.amazonaws.com/Section111ValidICD10-Jan2024.csv
  * This file has three columns: CODE, SHORT DESCRIPTION, LONG DESCRIPTION, and NF EXCL
  * The NF EXCL indicates that this code is excluded from a "no fault" list related to workers compensation insurance claims
3. Your `week11()` function must use Pandas functions to generate new columns and filter the dataframe using the following rules
   * Create a new column called "CODE TYPE" that contains only the first character of the CODE column. For example if CODE="A001" then CODE TYPE="A"
   * Create a new column called "CODE NUM" that contains only the numeric part of the CODE column and make it numeric. For example if CODE="A001" then CODE NUM=1
   * Some CODE NUM portions cannot be converted directly because the have an "X" in them.  Convert that "X" to a "." and then conver the CODE NUM to a numeric value.  For example if CODE="E1037X1" then CODE NUM=1037.1
   * Filter your results to only include those rows where NF EXCL="Y"
   * Sort your results in ascending order by CODE NUM and then by CODE TYPE
4. Use the "checker" in the last cell to confirm that your results are correct.  If the checker gives any errors, you will receive no credit.


## Submitting
Submit the assignment by creating a folder called week11_inclass in the repository of one of the people in your group. At the top of your submission, enter the names of all of the people in your group..

## Scoring Rubric
If your code passes my checker included at the bottom of this page, each person on your team will earn 5 points.  If you code does not pass my checker, you will earn 0 points. This is "all or nothing" extra credit.

---

Tarun Kumar Paidipamula,
Harshini Nandigama,
Dhruva Kumar Chatarajupalli,
Latha Reddy Battula,
Yaswanth Reddy Gogireddy.

In [40]:
# I've provide you code here to start with.

import pandas as pd

def week11():
    """() -> pd.DataFrame

    This function will process the file named in step 2 of the instructions above
    using the rules in step 3 above.  It will return a dataframe that contains
    the filtered, sorted, and enhanced results.

    For my tests, I will validate the shape to start with.
    If I have more time, I can figure out how to write tests for the other requirements.

    >>> week11().shape
    (1090, 6)
    """
    hospitals = pd.read_csv('https://hds5210-data.s3.amazonaws.com/Section111ValidICD10-Jan2024.csv')

    import pandas as pd

def week11():
    """
    Processes the ICD-10 CSV file from the internet, applies transformations and filters,
    then returns the resulting DataFrame.

    The function follows these steps:
    1. Loads the CSV file from the provided URL.
    2. Creates a "CODE TYPE" column containing the first character of the "CODE" column.
    3. Creates a "CODE NUM" column containing the numeric portion of the "CODE" column.
       - If "CODE" contains an "X" within the numeric part, it is replaced by a decimal (".").
    4. Filters the DataFrame to only include rows where "NF EXCL" is "Y".
    5. Sorts the DataFrame by "CODE TYPE" (ascending) and then by "CODE NUM" (ascending).

    Returns:
        pd.DataFrame: The processed, filtered, and sorted DataFrame.
    """
    # Load the CSV file from the provided URL
    url = 'https://hds5210-data.s3.amazonaws.com/Section111ValidICD10-Jan2024.csv'
    df = pd.read_csv(url)


    #  Create the 'CODE TYPE' column by extracting the first character of the 'CODE' column
    df['CODE TYPE'] = df['CODE'].str[0]

    #  Create 'CODE NUM' column
    # Extract the numeric part, replace 'X' with '.', and convert to a numeric type
    df['CODE NUM'] = df['CODE'].str[1:].str.replace('X', '.', regex=False)
    df['CODE NUM'] = pd.to_numeric(df['CODE NUM'], errors='coerce')

    #  Filter rows where 'NF EXCL' == 'Y'
    df_filtered = df[df['NF EXCL'] == 'Y']

    #  Sort by 'CODE TYPE' and then by 'CODE NUM'
    df_sorted = df_filtered.sort_values(by=['CODE TYPE', 'CODE NUM'], ascending=[True, True])

    # Return
    return df_sorted


In [41]:
hospitals = pd.read_csv('https://hds5210-data.s3.amazonaws.com/Section111ValidICD10-Jan2024.csv')
hospitals['NF EXCL'].value_counts()

Unnamed: 0_level_0,count
NF EXCL,Unnamed: 1_level_1
Y,1098


---

## You can run your doctests this way

In [42]:
from doctest import run_docstring_examples
run_docstring_examples(week11, globs=globals(), verbose=True)

Finding tests in NoName


---

## Use this code to check your output!

If you get something other than `"You did it!!"` then you still have work to do on your solution.

The feedback provided should give you some hints as to what you haven't done correctly in filtering and organizing the data.

You can run this as many times as you want.  I'm not recording who is trying what and if you're getting the right answers or not.

In [43]:
import requests

r = requests.post('https://rln3ys6dciybh6cydvapszesna0oxcyn.lambda-url.us-east-1.on.aws/',
                  headers={"content-type": "application/json"},
                  data=week11().to_json(orient='records'))

print(r.status_code)
print(r.text)

200
"You did it!!"
