Skip to content

Kbarias/BigData_Project2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

*Prerequisites for Running on Jupyter Notebook (Recommended; no decimal precision using terminal)
1. Have Java installed (specifically version 8)
2. Have Apache Spark downloaded
3. Have Anaconda installed
4. Have Jupyter Notebook installed (useful link: https://medium.com/@naomi.fridman/install-pyspark-to-run-on-jupyter-notebook-on-windows-4ec2009de21f)

How to run the project on Jupyter Notebook
1. Open Jupyter Notebook via Anaconda
2. Open tf-idf.py in a Jupyter Notebook
2. Press the 'Run' button.


How to run the project in terminal
1. Navigate to folder where tf-idf.py file is downloaded
2. Comment out the line in the file: 'import findspark' and 'findspark.init()'
3. Type into terminal: spark-submit tf-idf.py
4. Press Enter

**to get tf x idf for terms with the pattern 'gene_xxx_gene', uncomment the line: #print(filtered_terms.collect())

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages