# **Biof395 Final Project**
Shane Chambers

## **Oview and Description**

The goal of this project is to make a dynamic application that will summarize the literature findings surrounding a miRNA that the user inputs into the program. This will require the program to:
- Receive user input
- Query and download the relevant literature 
- process the literature 
- Transform the literature into feature representations
- Build a text mining model
- Evaluate the performance of the model

We will work through each aspect of this process below.

## **Receive User Input**

First, we will prompt the user to enter the [miRBase](http://www.mirbase.org/) accession of their miRNA of interest. This is a unique code associated with every known miRNA that standardizes the nomenclature to avoid confusion.

In [1]:
user_mir = input('Enter the miRbase accession number of your miR of interest:')

# For this example, we will enter the accession of mmu-mir-100; MI0000692

Enter the miRbase accession number of your miR of interest MI0000692


Now, we will reference this to a database downloaded from the miRbase FTP site [containing every known miR and its accession number.](ftp://mirbase.org/pub/mirbase/CURRENT/miRNA.xls.gz) This file has been unzipped and downloaded, and is saved in this folder as `miRNA.xlsx`. Below, we will import and manipulate it using `pandas` so that the `user_mir` can be queried against it. 

In [14]:
import pandas as pd

mir_database = pd.read_excel('miRNA.xlsx')

mir_database_1 = mir_database.loc[:, ['Accession', 'ID']]
mir_database_2 = mir_database.loc[:, ['Mature1_Acc', 'Mature1_ID']].rename(columns = {'Mature1_Acc':'Accession', 'Mature1_ID':'ID'})
mir_database_3 = mir_database.loc[:, ['Mature2_Acc', 'Mature2_ID']].rename(columns = {'Mature2_Acc':'Accession', 'Mature2_ID':'ID'})

final_database = pd.concat([mir_database_1, mir_database_2, mir_database_3])

final_database.head()

Unnamed: 0,Accession,ID
0,MI0000001,cel-let-7
1,MI0000002,cel-lin-4
2,MI0000003,cel-mir-1
3,MI0000004,cel-mir-2
4,MI0000005,cel-mir-34


Now, we need to query the `user_mir` against this database and store the corresponding `ID` as the item we will search in pubmed. We will also check that there is an `ID` that corresponds with the given accession number. 

In [28]:
filtered_database = final_database[final_database['Accession']  == user_mir]['ID']

if filtered_database.size == 1:
    mir = filtered_database.iloc[0]
    print('The accession number ' + user_mir + ' corresponds to miR ' + mir)
else:
    print('miR accession is incorrect. Try again (caps sensitive)')

The accession number MI0000692 corresponds to miR mmu-mir-100
