In [1]:
import json
from KeyProcessor import KeyWebsite

# TEA Key Converter Tool

This tool scrapes keys sourced from the TEA website and outputs a JSON file with new mappings.

The standard work flow for getting a file from the TEA for EDA and its keys are as follows:
1. Visit the TEA TAPR page
2. Select the year of interest
3. Select TAPR Data Download
4. Select TAPR Data in Excel
5. Select Campus by District within County
6. Select Williamson
7. Select Round Rock ISD
8. Select target campuses (in this case we are focusing on high schools)
9. Select desired attributes
10. Click on the Reference Link
11. Input the reference link into the variable prerequisites

### Requirements:
1. Web link to dataset reference from TEA
2. Required directories present - to create or check for required directories, run the code in the following cell
3. New key mappings (column names) defined by the user


In [2]:
# Will create required directories or return a string if they are already present
Processor.create_required_dirs()

Required directories present


### Variable Prerequisites and Definitions

You will need to define the following variables before running the tool.

Required variables and their descriptions are as follows:

1. <b>`tea_website_url`</b> - TEA website link reference for the dataset columns

2. <b>`replacement_words`</b> - A dictionary containing target words to be replaced in the original dataset column descriptions

Examples:

```python
tea_website_url = 'https://rptsvr1.tea.texas.gov/perfreport/tapr/2020/xplore/ccad.html'
```

```python
replacement_words = {
    Average: Avg,
    Male: M
}
```

#### Please define these variables below:

In [3]:
tea_website_url = 'https://rptsvr1.tea.texas.gov/perfreport/tapr/2020/xplore/ccad.html'
replacement_words = {
    'Science': 'Sci',
    ' American': ' Amer.',
}

assert tea_website_url != '', 'Please enter set the tea_website_url variable'
assert len(replacement_words) != 0, 'Please enter set the tea_website_url variable'

## Step 1: Scrape the Website, Save Contents to JSON

By default, the save_json method will save the file to the Generated Keys folder. If you are saving a cleaned file, set `mode='cleaned'` to save the file in the right location.

In [4]:
ccad_scraper = Processor(tea_website_url)

created_dict = ccad_scraper.scrape()

ccad_scraper.save_json()

print(f'JSON file stored at: ./{ccad_scraper.filename}')
print(json.dumps(created_dict, indent=4))

JSON file stored at: ./Generated_Keys/Campus College Admissions (SAT & ACT).json
{
    "CB0CAA18R": "Campus 2018 ACT: African American Students, ACT Average",
    "CB0CAE18R": "Campus 2018 ACT: African American Students, ACT ELA Average",
    "CB0CAM18R": "Campus 2018 ACT: African American Students, ACT Math Average",
    "CB0CAC18R": "Campus 2018 ACT: African American Students, ACT Science Average",
    "CA0CAA18R": "Campus 2018 ACT: All Students, ACT Average",
    "CA0CAE18R": "Campus 2018 ACT: All Students, ACT ELA Average",
    "CA0CAM18R": "Campus 2018 ACT: All Students, ACT Math Average",
    "CA0CAC18R": "Campus 2018 ACT: All Students, ACT Science Average",
    "CI0CAA18R": "Campus 2018 ACT: American Indian Students, ACT Average",
    "CI0CAE18R": "Campus 2018 ACT: American Indian Students, ACT ELA Average",
    "CI0CAM18R": "Campus 2018 ACT: American Indian Students, ACT Math Average",
    "CI0CAC18R": "Campus 2018 ACT: American Indian Students, ACT Science Average",
    "C30CA

## Step 2: Replace Column Descriptions with Shortened Versions

Depending on your needs, the Processor `clean()` method allows for a few options for renaming the descriptions of the columns provided by TEA. Also note that the method will return a copy of the json file
it just created.

Options are:

1. Replace words using default values which are
```python
default_replacements = {
        'Average': 'Avg',
        ' Students': '',
        'Male': 'M',
        'Female': 'F',
        '  ': ' ',
    }
```

    By default this setting is turned on, to turn it off set `override_defaults` to `True` in the method

2. Replace words using your `replacement_words` defined in the variable prereq section


Examples:
* Cleaning the data without the defaults and replacement_words `scraper.clean(replacement_words, override_defaults=True)`
* Cleaning the data with the defaults and replacement words `scraper.clean(replacement_words)`
* Cleaning the data with only the defaults `scraper.clean()`

In the following code block, I use the defaults and the replacement words.

In [5]:
cleaned_dict = ccad_scraper.clean(replacement_words)
ccad_scraper.save_json(mode='cleaned')

print(f'JSON file stored at: ./{ccad_scraper.cleaned_filename}')
print(json.dumps(cleaned_dict, indent=4))


JSON file stored at: ./Processed_Keys/Campus College Admissions (SAT & ACT) Processed Keys.json 
{
    "CB0CAA18R": "Campus 2018 ACT: African Amer., ACT Avg",
    "CB0CAE18R": "Campus 2018 ACT: African Amer., ACT ELA Avg",
    "CB0CAM18R": "Campus 2018 ACT: African Amer., ACT Math Avg",
    "CB0CAC18R": "Campus 2018 ACT: African Amer., ACT Sci Avg",
    "CA0CAA18R": "Campus 2018 ACT: All, ACT Avg",
    "CA0CAE18R": "Campus 2018 ACT: All, ACT ELA Avg",
    "CA0CAM18R": "Campus 2018 ACT: All, ACT Math Avg",
    "CA0CAC18R": "Campus 2018 ACT: All, ACT Sci Avg",
    "CI0CAA18R": "Campus 2018 ACT: Amer. Indian, ACT Avg",
    "CI0CAE18R": "Campus 2018 ACT: Amer. Indian, ACT ELA Avg",
    "CI0CAM18R": "Campus 2018 ACT: Amer. Indian, ACT Math Avg",
    "CI0CAC18R": "Campus 2018 ACT: Amer. Indian, ACT Sci Avg",
    "C30CAA18R": "Campus 2018 ACT: Asian, ACT Avg",
    "C30CAE18R": "Campus 2018 ACT: Asian, ACT ELA Avg",
    "C30CAM18R": "Campus 2018 ACT: Asian, ACT Math Avg",
    "C30CAC18R": "Cam