# Digital reporting

This chapter is about how to share python code and the outcome of your analyses with others. While there are possibilities to convert notebooks to conventional documents such as pdf, we are going to focus on online platforms. In my opinion the best ways to share your work are given by code hosting platforms such as github and by interactive web applications.  This is why we take a look at git, github and streamlit in this chapter.

## Git and Github

Git is a popular version control system created by Linus Torvalds in 2005. It is used for tracking code changes. Git manages projects by *repositories* and can be used for cooperative work. Once they are initialized, repositories act as a central hub for your project's files, allowing team members to collaborate effectively without stepping on each other's toes. Of course, a team can also only consist of one person, you. To work locally, you create a local copy of the project on your own machine by cloning. This is your personal workspace where you can make changes without affecting the main project. It's like having your own sandbox to build and experiment in, ensuring that your initial trials and errors don't disrupt the overall project flow.

Once you've made changes in your local copy, the next step involves controlling and tracking these modifications. This is where staging and committing come into play. Staging allows you to select which changes you wish to mark as ready for a commit. Think of it as preparing a list of items you're about to check out at a store. Committing is the act of finalizing this list, essentially saying, "These changes are good to go." It's a snapshot of your work at a specific point in time, making it easier to track progress and revert changes if necessary.

However, projects are seldom a straight path. They branch out, requiring simultaneous development on different features or versions. This is where branching and merging become invaluable. By creating branches, you can work in isolation from the main project line, allowing for experimentation and development without risking the stability of the main project. Merging, on the other hand, is how you reintegrate these divergent paths, combining the fruits of separate labors into a single, cohesive project.

As the project evolves, staying updated is crucial. Pulling is the process of fetching the latest version of the project from the repository and merging it with your local copy. This ensures that you're always working with the most recent information, reducing the likelihood of conflicts and redundancies.

Finally, after diligent work on your part, comes the moment to share your contributions with the team. Pushing is how you upload your local updates to the main project, offering your improvements for others to use and build upon. It's the culmination of your effort, a way to visibly advance the project and collaborate effectively with your peers.

Git should not be mistaken with [Github](https://github.com/) which utilizes git. GitHub is a powerful and widely used platform for version control and collaboration, allowing developers from around the world to work together on projects of all sizes.

GitHub was launched in 2008 by Tom Preston-Werner, Chris Wanstrath, and PJ Hyett. The platform was created to facilitate the sharing and development of code, enabling developers to collaborate on projects without being hindered by geographical constraints. Its primary aim is to streamline the process of coding, review, and software development, making it accessible and manageable for developers of all skill levels.

The key features of Github are:

* Repositories: At the heart of GitHub are repositories, which are essentially folders containing a project's files along with the history of changes made to those files. Repositories can be public or private, supporting both open-source projects and confidential business work.
* Forking and Cloning: GitHub allows users to fork (copy) repositories, enabling them to make changes without affecting the original project. Cloning gives developers a local copy of a repository to work on, providing flexibility in how and where they code.
Branching and Merging: These features facilitate the development of new features or testing out ideas in separate branches from the main project. Merging allows for the integration of these changes into the main project after proper review and testing.
* Pull Requests: A key collaboration feature, pull requests let developers notify team members about changes they've pushed to a branch in a repository. This opens up a platform for discussion, review, and eventually merging those changes into the main branch.
Issues and Tracking: GitHub provides tools for tracking issues and tasks related to a project. Users can create, assign, and comment on issues, making it easier to manage project milestones and bugs.

Finally, note that git can be used without github, however, it does not work the other way around. Nevertheless, most people use both. In the next subchapter, we go through some basic steps of a workflow which includes the usage of Git and Github and demonstrates how the analysis of a jupyter notebook can be made accessible to others.

## Demo project

First, at a local folder on your system create a demo folder, in my scenario it is called *digital_reporting_demo*, however, you can name it as you like. In this folder I copy a notebook which I already prepared. In your case maybe start with an own notebook from your previous analysis of this course. Note, that all steps which follow are listed according to my personal preferences while multiple other ways exist to fulfill these tasks.

To track changes in your demo folder, use the terminal and move to your folder. Inside the folder call `git init` to initialize the local repository. If you receive an error message that git is not installed go to this [web page](https://git-scm.com/downloads) and download and install the latest version. If you are using git for the first time, it might be a good idea to add your name and email address to the configuration by `git config --global user.name "my name"` and `git config --global user.email "my email"`. Note that this is optional.

So far, not much happened. Whenever you want to get the current status of your Git tracked repository, you can call `git status` which currently tells you the there have not been any commits in the repository. This means Git is aware of the file, but it is not tracked currently. To do so, you need to add the folder content to the staging environment and commit them. To add something to the staging environment, you should use `git add`. You can either do this for every file by `git add filename` or for all current content in the folder by `git add --all`. Once you did this, check the status again. It should tell you that there have not bee any commits yet. Next, we can commit the status of our work. Committing something via Git creates a snapshot of the current status. Later, you can go back to this status if you want. To do this with Git call `git commit -m "a message which explains the status or change"`. Note that a commit should always be done with a message that gives a brief explanation of the project's status. To examine a history of commits to the repository, you can use the `git log` function.

One of the most powerful concepts of Git when working in a team is *branching*. A branch is a separated environment in your repository which can be created to work on the project without altering the original version. This has the purpose to only include new changes once we are sure that they work and do not cause any problems. The process is to create a new branch, make adjustments and merge the branch to the current project's version. If you call `git branch` you can see the current branches in your repository. Currently, it should only be the *main* branch. To create a new branch call `git branch desired_branch_name`. Calling `git branch` should now show you two branches. The one with the asterisk symbol is the current branch. To switch to the new branch call `git checkout desired_branch_name`. Now add a new file to your project, e.g., by calling `nano Readme.md` in the terminal (or in some other way). Write some text you like and close and save. Now, call `git status` which should tell you that there are untracked files on your branch. Next stage and commit these changes for the branch as before by using `git add` and `git commit`. As a result, we now have two branches (this one and main) with different versions of the repository. To examine this, first call `ls` in the terminal. You should see two files (the first one with which we started and the second one we currently added). Now switch back to the main branch by using `git checkout main` and call `ls`. You should only see the first file with which we started. This is what it is all about! To include the new changes into the main project (and branch), make sure you are currently in the *main* branch and call `git merge branch_name_to_merge`. Once you did this, it should tell you a Fast-forward merge has been done. This tells us that Git sees the changes as a continuation of the main branch. If the purpose of the branch has been fulfilled, we can delete it by `git branch -d branch_name`. If different branches are used at the same time, merge conflicts may occur. See this [example](https://www.w3schools.com/git/git_branch_merge.asp?remote=github) how to examine and fix conflicts. 

This is how we can use Git locally to manage changes of a project. If we want to share the project with others, we can make use of Github. You need an account and sign in to do the following steps. Once yor are signed in at Github, you can create an online repository by clicking on the plus symbol in the upper right of the page and select *new repository*. Next, give the repository a name, I prefer to use the same name as on my local system, however, this is optional. Click on *create repository*. Github will display some quick setup recommendations. As we already have a local repository on our system, we want to push an existing repository from the command line. Copy the displayed code into your terminal and run it. If everything worked as it should, you can refresh your page on Github and you should see the changes in the online repository matching the content of your local repository. You should also see your notebook, if you click on it and observe the content of your notebook. As the output of your notebook is rendered this is a powerful way to share your analyses with others that do not have any knowledge about python. See this [link](https://github.com/RalfKellner/digital_reporting_demo/blob/main/example_notebook.ipynb) to examine my notebook in the online repository. 

To integrate the workflow of Git and Github, let us change a file in the online repository. In my example, I added a *Readme.md* file whose content is displayed at the [front page](https://github.com/RalfKellner/digital_reporting_demo/) of the Github repository. By clicking at the *Edit file* symbol, you can directly make changes to a file in the online repository. Note that this simulates if another team member created changes to the online repository which are not included into your local repository so far. 

To update your local repository you can call `git pull` from your terminal. In detail, this fetches updates from the online repository and merges them with your main branch. A detailed explanation can be found [here](https://www.w3schools.com/git/git_pull_from_remote.asp?remote=github). If your main branch is ahead of the origin in the online repository, you should push your committed changes by calling `git push origin main` which states to push the commit of your local main repository to the remote origin repository. 

The only thing left is how to work with branches on Github. To create a branch on Github in your online repository, you can click the *main* branch in the left below your repository's name. By typing a new name the create branch option pops up which you can use to create the branch in the online repository. By creating it, it is already active. Now as before, make some change in the online repository and commit them online to the new branch. If we not `git pull` on our local repository, the output tells us that a new branch is identified, however, calling `git branch` locally does not show the new branch. But, if you call `git branch -a` the online repository should be visible. Use the `git checkout online_branch_name` call to switch to the state of the online branch. If you examine the status, you will see that this branch is up to date and now calling `git branch` shows you that the branch is available as if you would have created it locally. You can now continue working on this branch and if your are done you can merge it to your local main branch which will going to be ahead of the online branch.

If you start with another local branch work on it and want to push it to Github just use `git push origin branch_name` if *branch_name* is active in your local repository. After the other branch is available in the online repository, you can follow the merge hint, pull and merge to merge changes to the main branch in the online repository. 

## Streamlit

Streamlit is an open-source app framework designed specifically for creating and sharing data apps built in Python. It is particularly popular among data scientists and machine learning practitioners due to its simplicity and ease of use. Streamlit allows users to transform data scripts into interactive web applications without requiring any web development experience.

### Key Concepts of Streamlit

1.	Widgets:
Streamlit provides a variety of widgets for user interaction, such as sliders, buttons, text inputs, and file uploaders. These widgets allow users to interact with the app and dynamically change its behavior.

2.	Reactive Programming:
Streamlit automatically updates the app when an input widget’s state changes, re-running the script from top to bottom. This approach simplifies the creation of reactive applications.

3.	Data Display:
Streamlit supports displaying various data formats, such as tables, charts, and images. It seamlessly integrates with popular libraries like Pandas and Matplotlib.

4.	Layout Management:
Streamlit allows you to structure the layout of your app using columns and expanders. This feature helps in organizing the app’s content more effectively.

5.	Theming and Customization:
Streamlit enables customization of the app’s appearance through theming options. Users can customize the sidebar, primary color, background color, and more.

6.	Interactivity with Plots:
Streamlit can display interactive plots from libraries like Matplotlib, Plotly, and Altair. These interactive visualizations enhance the user experience by allowing dynamic exploration of data.

Streamlit simplifies the process of creating interactive and data-driven web applications in Python. With its intuitive API, reactive programming model, and rich support for data visualization, Streamlit is an excellent choice for data scientists and developers looking to share their work in an interactive format. By leveraging widgets, layout management, and customization options, users can build sophisticated apps quickly and efficiently.

### Installation and basic usage

To install streamlit via pip, simply use:

```python
pip install streamlit
```

Other ways to install streamlit can be found [here](https://docs.streamlit.io/get-started/installation). Instead of running streamlit via a notebook, one should run it in a script. A reasonable approach is to create a new folder, install a virtual environment in this folder, install streamlit and other necessary packages. 

The streamlit app is going to written in a .py script file. Once this is created, we can run the script from the command line by:

```python
streamlit run my_first_streamlit_app.py
```

Once you execute this, a local server will be created and the app is run in a new tab on your default browser. The content of the app is set by you and the help of different widgets. For instance, if the python script looks as 

```python
import streamlit as st

st.write("Hello world!")
```
an app will open with the text written as above. Of course widgets can do much more than simply writing text. An overview of the API documentation can be found [here](https://docs.streamlit.io/develop/api-reference). Examples are the title, header, subheader or markdown widgets for structuring the app and include text content or data and chart widgets for showing data and graphics on the app. Input widgets allow for user interaction.

If you run the app for the first time, you can select the always run option in the settings of the app. Consequently, whenever you change the app's script and save it, the app will be updated automatically. This allows you to update your app in a very interactive work process. 

### Basic widgets

By default, streamlit uses "magic commands" if you do not actively use widgets. This means if your script looks something like this:

```python
import streamlit as st
import numpy as np
import pandas as pd


welcome_string = "Hello world! This is my first app."
welcome_string

df = pd.DataFrame(data = np.random.rand(6).reshape(3, 2), index = [1, 2, 3], columns = ["A", "B"])
df
```

it will show the string and the pandas data frame in the web app. Secondly, you can also use the st.write() widget which evaluates the type of the instance which is passed to it and renders it in the best possible way.

However, often it is preferable to use type specific widgets for rendering them, as this gives the user more control regarding the way instances are rendered. For instance, if you use

```python
import streamlit as st
import numpy as np
import pandas as pd

welcome_string = "Hello world! This is my first app."
st.text(welcome_string, help = "Be creative!")

df = pd.DataFrame(data = np.random.rand(6).reshape(3, 2), index = [1, 2, 3], columns = ["A", "B"])
st.dataframe(df, use_container_width=True)
```

you can provide an additional helping comment to the text and display the data frame with a width which is automatically adjusted. 

### Layout widgets

Layout widgets are important for structuring your app. For instance, we are going to include some interactive elements into the app later. To keep this input at, e.g., the left side of the app, we can use

```python
with st.sidebar:
    # all which goes here is displayed on the left side of the browser
````

Another example is the the st.columns widget. Imagine you want to show two graphics or tables next to each other. This would be achieved by

```python
col1, col2 = st.columns(s)

with col1:
    st.dataframe(df1)
with col1:
    st.dataframe(df2)
```

### Input widgets

Input widgets are very useful for transforming your app into an interactive experience for yourself and other users. Input widgets can be treated like variables themselves. This example

```python
import streamlit as st
import numpy as np
import pandas as pd


with st.sidebar:
    name = st.text_input(label = "Please provide your name here:", value = "")

welcome_string = f"Welcome {name}, what a lovely name!"
st.text(welcome_string)
```

receives the inserted name and stores it under the name variable. Afterwards, this can be used to write a welcoming message to the app which includes the users name. Alternatively, you can give each input widget a characteristic key and retrieve its value under the st.session_state.key_value.

```python
import streamlit as st
import numpy as np
import pandas as pd


with st.sidebar:
    st.text_input(label = "Please provide your name here:", value = "", key = "name")

welcome_string = f"Welcome {st.session_state['name']}, what a lovely name!"
st.text(welcome_string)
```
There are many different input keys available which can be found in the API's documentation as well. 

### Advanced concepts

The best way to learn streamlit is to get your hands dirty and create different apps to learn it. Before you can do this, knowing about a few more advanced concepts might be helpful at times. The first one is *caching* which is useful if your app includes downloading data or computational intensive calculations. To use caching you must use the @st.cache_data decorator which can and should be used for all serializable data objects such as str, int, float, DataFrame, dict, list. Caching memorizes if a computation or data download with the same input parameters already has been conducted before. If this is true, previous results are saved in the cache and can be retrieved immediately. For instance, the if the function below is executed twice with the same company_id and starting_date, the data is only downloaded for the first time and taken from the cache the second time.

```python
def collect_data_for_company(company_id, starting_date):
    # collect and return data for a company with a specific id starting at a specific date
```

Another caching decorator is the @st.cache_resource decorate which is used for caching global resources such as machine learning models or data base connections. For instance, the transformers package gives us the opportunity to download and use language models for certain purposes. These models are often large and it takes some time to download them. If we just would include code to download and use the model in the script, it would be downloaded every time a user changes a value of an input widget. However, this code

```python
from transformers import pipeline

@st.cache_resource  # 👈 Add the caching decorator
def load_model():
    return pipeline("sentiment-analysis")

model = load_model()
input_text = st.text_input("Provide a text for sentiment classification.", value="Python and streamlit are the best!")

if input_text:
    result = model(input_text)[0]  
    st.write(result)
```

only downloads the model once and is able to use it afterwards for an arbitrary number of sentence classification inputs by the user. Only at session start, the model is downloaded once. 

Besides caching, it is useful to know about session states. A session is a single instance when viewing the app. Once a browser page is refreshed, the session state is reset. We have already seen that input widget instances are saved in the session state if a key is provided. The concept can be used in a broader way, e.g., to set states in the background which may be included in the app. For instance the app below makes sure that each time the user presses the button data is sampled with a different and specified random seed. This makes sure the user always creates the same random numbers when pressing the button after she resets the session. 

```python
import streamlit as st
import numpy as np
import pandas as pd


with st.sidebar:
    st.button("Press this button to sample new random variables.", key = "random_button")

if "seed" not in st.session_state:
    st.session_state.seed = 1
else:
    st.session_state.seed += 1

np.random.seed(st.session_state.seed)
st.write(f"Data are sampled with random seed: {st.session_state.seed}")
x = np.random.randn(10)
st.write(x)
```

### Multipage apps

Streamlit is not restricted to web pages with just one page. Multipage apps can easily be generated by including a pages/ directory next to a entrypoint file (the py script which defines the landing point of your web app). Each py script in the pages/ directory will create an additional page for the app. Page labels and URLs are automatically build from the filenames. 

However, it is recommended to use st.Page and st.navigation if you want more flexibility. For a project, your folder may look like this:

* my_project
    * main.py
    * page_one.py
    * page_two.py

The *page_one.py* and *page_two.py* script include content for each page. Within the *main.py* script you can create pages which are based upon these scripts. Next, you can define the navigation for the site with Streamlit's *navigation* method.

```python
import streamlit as st

page_one = st.Page("page_one.py", title = "Page One")
page_two = st.Page("page_two.py", title = "Page Two")

pg = navigation = st.navigation([page_one, page_two])
pg.run()
```