# B-allele Plot in a Streamlit App Running in Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Eitan177/EitanAmrom/blob/main/plot_baf.ipynb)

- `!pip install -q streamlit`: Installs the Streamlit library, essential for creating the interactive web app.
- `!pip install psutil`: Installs the psutil library, which provides system and process monitoring capabilities (though not directly used in the current code).

In [None]:
!pip install -q streamlit
!pip install psutil

This code defines a Streamlit application that visualizes allele frequencies from a variant call format (VCF) file or a similar tabular data file.  Let's break down the code section by section:

**2. Streamlit App Setup (`app.py`):**

- `st.set_page_config(layout='wide')`: Configures the Streamlit page layout to utilize the entire browser width, enhancing the visualization.
- `st.title(...)`: Sets the title of the web app.
- File Uploader: `st.file_uploader(...)`:  Creates an interactive file upload widget in the app. Users can upload files (txt, maf, table formats) containing variant data.
- Checkboxes and Radio Buttons: `st.checkbox(...)` and `st.radio(...)` create interactive controls to filter and customize the plot:
    - `onlysnps`: Filters for SNPs (single nucleotide polymorphisms).
    - `usegenomicCoordinate`: Toggles whether to use genomic coordinates or indices for the x-axis.
    - `colorselection`: Lets the user choose to color data points by 'Gene' or 'DBSNP' identifier.

**3. Data Processing:**

- Conditional Data Reading: The code checks the uploaded file type and reads the data accordingly using pandas:
    - `.read_table()`: For text or table files (tab-separated).
    - `.read_csv()`: For MAF files (tab-separated).
    - It handles different file formats and column names gracefully with `try...except` blocks.
- Data Cleaning and Transformation:
    - Calculates allele frequency ('AF') based on the data available (adjusts the formula depending on the file format).
    - Adjusts column names to be consistent across file types
    -  Filters data based on the `onlysnps` checkbox.
    - Sets up an 'ind' column for the x-axis (index or position).
    - Assigns colors based on the `colorselection`.
- Plotting with Streamlit:
    - `st.columns(...)`: Divides the page into three columns for parallel visualization of different chromosomes.
    - Iterates through unique chromosome values (`np.unique(chart_data['CHROM'])`) and plots a scatterplot for each.
        - `st.scatter_chart()`: Generates an interactive scatterplot.  x-axis is based on either 'ind' (index or genomic coordinate) and y-axis is 'AF'. The size and color of the points are controlled using other columns ('size', 'color').
        - The plots are then positioned in different columns as calculated by the `m % 3` logic.
- Display Dataframe:  `st.write(chart_data)` displays the underlying dataframe as a table for further inspection.

In [None]:
%%writefile app.py
import streamlit as st
import pandas as pd
import numpy as np
import re
st.set_page_config(layout='wide')
st.title('Plot allele frequency column from vcf file')
mvfe=st.file_uploader('Upload master variant file extreme here',type=['txt','maf','table'])
onlysnps=st.checkbox('only SNPs')
usegenomicCoordinate=st.checkbox('Use genomic coordinates instead of indices')
colorselection=st.radio('color points using:', ['Gene','DBSNP'])

if mvfe != None:
    st.write(mvfe.name)
    if mvfe.type == "text/plain":
        st.write('reading text file')
        chart_data = pd.read_table(mvfe,sep='\t')
    elif re.findall('table',mvfe.name):
        st.write('reading table file')
        chart_data = pd.read_table(mvfe,sep='\t',skiprows=1)
        chart_data['AF']=chart_data['allele_frequency']
        chart_data['POS']=chart_data['position']
        chart_data['GENE']=chart_data['alt_count']
        chart_data['CHROM']=chart_data['contig']
    else:
        st.write('reading maf file')
        chart_data = pd.read_csv(mvfe,sep='\t',skiprows=1)
        try:
            chart_data['AF']=chart_data['DP4'].str.split(',',expand=True).astype(int).apply(lambda x: x[2:4].sum()/x.sum(),axis=1)
        except:
            chart_data['AF']=chart_data[['t_alt_count','t_depth']].apply(lambda x: x[0]/x[1],axis=1)
        chart_data['POS']=chart_data['Start_Position']
        chart_data['GENE']=chart_data['Hugo_Symbol']
        chart_data['CHROM']=chart_data['Chromosome']
    if onlysnps:
        try:
            chart_data=chart_data.iloc[np.where(chart_data['rsID'].str.contains('rs'))[0]]
        except:
             chart_data=chart_data.iloc[np.where(chart_data['dbSNP_RS'].str.contains('rs'))[0]]
    chart_data['ind']=  chart_data['POS']
    if colorselection=='Gene':
        chart_data['gene_v_snp']=[str(y) for y in chart_data['GENE']]
    else:
        chart_data['gene_v_snp']=[str(y) for y in chart_data['dbSNP_RS']]

    # Display a scatterplot chart
    cola,colb,colc= st.columns(3)
    m=0
    for a in np.unique(chart_data['CHROM']):
        chrom = chart_data[chart_data['CHROM']==a]
        if not usegenomicCoordinate:
            chrom['ind']=np.arange(0,chrom.shape[0])
        chrom.reset_index(inplace=True)
        if m % 3 == 0:
            with cola:
                st.title(a+ ' x-axis index')
                st.scatter_chart(chrom,x='ind', y='AF',size=30,color='gene_v_snp')
        elif m % 3 == 1:
            with colb:
                st.title(a+ ' x-axis index')
                st.scatter_chart(chrom,x='ind', y='AF',size=30,color='gene_v_snp')

        elif m % 3 == 2:

            with colc:
                st.title(a+ ' x-axis index')
                st.scatter_chart(chrom,x='ind', y='AF',size=30,color='gene_v_snp')
        m+=1
    st.write(chart_data)

The command `!wget -q -O - https://loca.lt/mytunnelpassword` downloads the content of the URL `https://loca.lt/mytunnelpassword` and prints it to the standard output without any progress indication. Let's break down the options:

* `!wget`: This invokes the `wget` command, a utility for downloading files from the web.  The `!` indicates that this is a shell command being executed within the notebook environment.

* `-q`: This option makes `wget` run in quiet mode.  Normally, `wget` would print progress information (percentage downloaded, download speed) to the console.  `-q` suppresses this output.

* `-O -`: Specifies the output file for the downloaded content.  Using `-` as the filename redirects the output to the standard output (stdout) of the shell.  In other words, the downloaded content will be displayed directly in the output of the notebook cell.

* `https://loca.lt/mytunnelpassword`: This is the URL from which the content is downloaded.  It appears to be retrieving a "mytunnelpassword" from a service called `loca.lt`, likely a password or some authorization token needed for a local tunnel.

In [None]:
!wget -q -O - https://loca.lt/mytunnelpassword

**`!streamlit run app.py &>/content/logs.txt &`**:

   - `!streamlit run app.py`: This command runs the Streamlit application defined in the `app.py` file.  Streamlit creates a web server to host the interactive application.
   - `&>/content/logs.txt`: This redirects both standard output (stdout) and standard error (stderr) from the `streamlit run` command to a file named `logs.txt` in the `/content/` directory. This is useful for debugging because any messages or errors from the Streamlit server will be captured in this log file instead of cluttering the Colab output.
   - `&`: This runs the `streamlit run` command in the background.  Without this, the Colab notebook cell would be blocked until the Streamlit server is manually stopped.  The `&` allows the cell execution to finish, and the Streamlit app continues to run independently.

**`!npx localtunnel --port 8501`**:

   - `!npx localtunnel`: This uses `npx` (a tool for executing Node packages) to run the `localtunnel` utility. `localtunnel` creates a public URL that forwards traffic to a local port on your machine (in this case, Colab's virtual machine).  This is how you can access the Streamlit app from outside the Colab environment.
   - `--port 8501`:  This specifies the local port that `localtunnel` should forward requests to. Streamlit typically runs on port 8501.

In [None]:
!streamlit run app.py &>/content/logs.txt &
!npx localtunnel --port 8501