# Project 4a: Goals and Deliverables

The goals of this assignment are:

* To work with the object oriented version of our corpus code.
* To make a web app (!) that we can use to analyze text data.

Here, **in order**, are the steps you should do to successfully complete this project:

1. From moodle, accept the assignment. Open and set up a code space (install a python kernel and select it).
2. I wrote the comments; you write the code! Modify `spacy_on_corpus.py` following the instructions in this notebook.
3. Complete the notebook and commit it to Github. Make sure to commit the notebook in a "run" state!
4. Edit the README.md file. Provide your name, your class year, links to/descriptions of any extensions and a list of resources. 
5. Commit your code often. We will take the last commit before the deadline as your submission of the project.

Possible extensions (from least points to most points):

* Modify the token, entity, and noun chunk get count methods so they count only lower cased tokens, entities and noun chunks.
* Modify the [styling](https://anvil.works/learn/tutorials/using-material-3) of the web app. 
* To the screen `Build Corpus` in the web app, add the ability for the user to choose the language of their input documents.
* To the screen `Analyze Document` in the web app, add the ability for the user to choose a value for `top_k` and to choose which token and entity tags to exclude.
* Plot more than one analysis at a time in `Analyze Corpus` (see [this page](https://anvil.works/docs/client/components/plots)).
* Add the ability for a user to enter jsonl in the input text area on the `Build Corpus` screen.
* If you added paragraphs to project 3c, port that over to project 4a.
* Add some metadata analysis and visualization on a fourth screen.
* Your other ideas are welcome! If you'd like to discuss one with Dr Stent, feel free.

# Setup

## Install Our Packages

On the command line (in the terminal), type:

% `pip install -r requirements.txt`

## Make Sure We Can Work With .py Files We Are Editing

Run the code cell below.

In [None]:
# Automatically reload your external source code
%load_ext autoreload
%autoreload 2

## Make a Webapp

1. Click on this link: [https://anvil.works/build#clone:CF4CM3ES5C5UDCV3=P2NOWAEOD6CN6KDDZ73HFVON](https://anvil.works/build#clone:CF4CM3ES5C5UDCV3=P2NOWAEOD6CN6KDDZ73HFVON).
2. Log in using your Colby email address (with Google).
3. Agree to clone the app: 

![image.png](attachment:image.png)

4. Click the blue and white `+`, then choose Uplink:

![image-2.png](attachment:image-2.png)

5. Click `+Enable server Uplink`
6. Copy the code and paste it in the code cell below.

In [None]:
# import anvil server, and connect using your API key


# Implement Anvil Server Code

For this project, we will use a **client-server** model. The Anvil website will host the client (the web app). The git codespace will host the server (the code that knows about a corpus).

We do this because we need more special python packages (like spaCy) and more computer memory than Anvil offers in the free version.

In the code cell below, implement the functions. Each function is decorated with `@anvil.server.callable` because the Anvil client can call it.

In [None]:
# import corpus from spacy_on_corpus

# make a corpus instance called my_corpus

@anvil.server.callable
def load_file(filename, file_contents):
    """Call build_corpus on file_contents, giving it name filename
    
    :param filename: the filename we want to store file_contents in
    :type filename: str
    :param file_contents: the contents we want to use to build / augment my_corpus
    :type file_contents: byte stream
    """
    # first we write file_contents to a file which will have name inputs/filename
    with open('inputs/' + filename, 'wb') as f:
      f.write(file_contents.get_bytes())
    # You call build_corpus on inputs/filename, giving it my_corpus as a keyword argument

@anvil.server.callable
def add_document(text):
    """Add a document to my_corpus using contents.
    
    :param text: the text we want to add to my_corpus
    :type text: str
    """
    # You add a document to my_corpus using text and give it a unique id
    # HINT: try giving it an id corresponding to the size of my_corpus
    pass

@anvil.server.callable
def clear():
    """Empty my_corpus."""
    # You implement this using an instance method of dict
    pass

@anvil.server.callable
def get_corpus_tokens_counts(top_k=25):
    """Get the token counts from my_corpus.
    
    :param top_k: the top_k tokens to return
    :type top_k: int
    :returns: a list of pairs (item, frequency)
    :rtype: list
    """
    # You return the token counts
    pass

@anvil.server.callable
def get_corpus_entities_counts(top_k=25):
    """Get the entity counts from my_corpus.
    
    :param top_k: the top_k entities to return
    :type top_k: int
    :returns: a list of pairs (item, frequency)
    :rtype: list
    """
    # You return the entity counts
    pass

@anvil.server.callable
def get_corpus_noun_chunks_counts(top_k=25):
    """Get the noun chunk counts from my_corpus.
    
    :param top_k: the top_k noun chunks to return
    :type top_k: int
    :returns: a list of pairs (item, frequency)
    :rtype: list
    """
    # You return the noun chunk counts
    pass

@anvil.server.callable
def get_corpus_tokens_statistics():
    """Get the token statistics from my_corpus.
    
    :returns: basic statistics suitable for printing
    :rtype: str
    """
    # You return the token statistics
    pass

@anvil.server.callable
def get_corpus_entities_statistics():
    """Get the entity statistics from my_corpus.
    
    :returns: basic statistics suitable for printing
    :rtype: str
    """
    # You return the entity statistics
    pass

@anvil.server.callable
def get_corpus_noun_chunks_statistics():
    """Get the noun chunk statistics from my_corpus.
    
    :returns: basic statistics suitable for printing
    :rtype: str
    """
    # You return the noun chunk statistics
    pass

@anvil.server.callable
def get_token_cloud():
    """Get the token cloud for my_corpus.
    
    :returns: an image
    :rtype: plot
    """
    # You get the token counts

    # You make the word cloud if token_counts is not None
    pass

@anvil.server.callable
def get_entity_cloud():
    """Get the entity cloud for my_corpus.
    
    :returns: an image
    :rtype: plot
    """
    # You get the entity counts

    # You make the entity cloud if entity_counts is not None
    pass

@anvil.server.callable
def get_noun_chunk_cloud():
    """Get the noun chunk cloud for my_corpus.
    
    :returns: an image
    :rtype: plot
    """
    # You get the noun chunk counts

    # You make the noun chunk cloud if chunk_counts is not None
    pass

@anvil.server.callable
def get_document_ids():
    """Get the ids of all document ids in the corpus.
    
    :returns: the document ids
    :rtype: list[str]
    """
    # You get the list of document ids in the corpus
    pass

@anvil.server.callable
def get_doc_markdown(doc_id):
    """Get the document markdown for a document in my_corpus.
    
    :param doc_id: a document id
    :type doc_id: str
    :returns: markdown
    :rtype: str
    """
    # You do it!
    pass
  
@anvil.server.callable
def get_doc_table(doc_id):
    """Get the document table for a document in my_corpus.
    
    :param doc_id: a document id
    :type doc_id: str
    :returns: markdown
    :rtype: str
    """
    # You do it!
    pass

@anvil.server.callable
def get_doc_statistics(doc_id):
    """Get the document statistics for a document in my_corpus.
    
    :param doc_id: a document id
    :type doc_id: str
    :returns: markdown
    :rtype: str
    """
    # You do it!
    pass

# Test The Web App

Perform the following steps in order:

1. Open the web app
2. Run the code cell below
3. Hit Clear in the web app
4. Use the file uploader to upload `files.jsonl.zip`
5. Go to the Analyze Corpus screen
6. Get the counts, statistics and word cloud for tokens; take screen shots and paste them below
7. Get the counts, statistics and word cloud for entities; take screen shots and paste them below
8. Get the counts, statistics and word cloud for chunks; take screen shots and paste them below
9. Go to the Analyze Documents screen
10. Render a document as markdown; take a screen shot and paste it below
11. Render a document table; take a screen shot and paste it below
12. Render statistics for a document; take a screen shot and paste it below

The code cell below starts an *infinite loop* as the server waits for input from the client. To interrupt it, interrupt the execution of the code cell or restart the notebook. Before closing your laptop, **always interrupt this infinite loop**. Otherwise you will run out of coding time in code spaces.


## Token statistics

## Token counts

## Token cloud

## Entity statistics

## Entity counts

## Entity cloud

## Chunk statistics

## CHunk counts

## Chunk cloud

## Document as markdown

## Document as table

## Document statistics

In [None]:
# start the server
anvil.server.wait_forever()

**always interrupt the server before leaving the code space!**

# Resources

* https://anvil.works/docs/uplink/quickstart
* https://anvil.works/docs/client/components/basic#fileloader
* https://anvil.works/docs/client/components/plots