# SPASE Record Analysis - How to Add New Extracted Items or New Tests

If you have not viewed the related notebook, "How to Use", do so before going through this notebook. <br>

This notebook runs through how to add to this project, specifically:
1. how to add additional fields to extract from SPASE records
2. how to add to the SQLite database 
3. how to add new database queries to report the results in the tables <br>

## Adding additional extraction fields

For this example, we will show how to add the ORCID ID.

The code introduced can be placed where it says to in the SPASE_Scraper_Script comments. Find them easily using Ctrl-F and searching for 'Code X'.

> First up, in order to account for the instance that there are multiple authors we are scraping, the variable for ORCID needs to be a list. Also needed is another variable to temporarily hold the ORCID ID since we only return the IDs of authors that are within priorities. To cover the case when no authors are provided, we need to give both ORCID variables default values: <br>

We will call this code Code A
```python
ORCID = []
ORCID_ID = ""
```

> Next, we need to know where the value would be found. ORCID ID would likely be found in the Contact section. With this in mind, we need to find in the SPASE_Scraper_Script where we iterate through that section: <br>
```python
elif child.tag.endswith("Contact"):
    C_Child = child
# iterate thru Contact to find PersonID and Role
for child in C_Child:
```

> After that, we need to add another elif statement to check the child nodes within Contact for the tag we are seeking, which in this case may be something like "ORCID". Then we just save the text tagged by ORCID into our temporary variable. This would look similar to what is needed: <br>

This is Code B.

```python
# find ORCID
elif child.tag.endswith("ORCID"):
    # store ORCID
    ORCID_ID = child.text
```

> Then, if an author is found that fits our priority rules, we assign this temporary value to the list at the same time we add the author name and author roles to theirs. This keeps the ordering the same so that the ORCID ID stays with the author it belongs to. There are 2 places the author can be collected outside of the Publication Info section, so both of these assignments would need to be added to each of these areas. <br>

This is code C.
```python 
ORCID = [ORCID_ID]
```
And this is code D.
```python
ORCID.append(ORCID_ID)
```

> Lastly, add the ORCID list as a return and edit the calls to the script in the main.py file to reflect the added return.

## Adding new field to the database

This section will continue with our previous example of the ORCID ID to show you how to add it to the SPASE_Data.db database.

> First, we would need to add a column to both the MetadataEntries and TestResults tables. This can be done by using the ALTER TABLE command in SQLite.

```python
from SQLiteFun import executionALL

executionALL("""ALTER TABLE MetadataEntries ADD COLUMN
                    ORCID_ID TEXT""")
executionALL("""ALTER TABLE TestResults ADD COLUMN
                    has_ORCID INTEGER""")
```