Kbarias/BigData_Project2
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
*Prerequisites for Running on Jupyter Notebook (Recommended; no decimal precision using terminal) 1. Have Java installed (specifically version 8) 2. Have Apache Spark downloaded 3. Have Anaconda installed 4. Have Jupyter Notebook installed (useful link: https://medium.com/@naomi.fridman/install-pyspark-to-run-on-jupyter-notebook-on-windows-4ec2009de21f) How to run the project on Jupyter Notebook 1. Open Jupyter Notebook via Anaconda 2. Open tf-idf.py in a Jupyter Notebook 2. Press the 'Run' button. How to run the project in terminal 1. Navigate to folder where tf-idf.py file is downloaded 2. Comment out the line in the file: 'import findspark' and 'findspark.init()' 3. Type into terminal: spark-submit tf-idf.py 4. Press Enter **to get tf x idf for terms with the pattern 'gene_xxx_gene', uncomment the line: #print(filtered_terms.collect())