Clippy 🔎

About 💡

We decided to tackle this project because as college students, most of us will spend much of our time reading an abundance of documents. Using the guidelines, we thought it would be appropriate to create a Smart PDF reader so that when given a pdf or txt file, we are able to use features that help us understand the document to its full effect.

How It Works 📖

Clippy takes a PDF and displays its contents, a summary, and its headings with a straightforward user interface. The summaries are generated using tokenization, count vectorization, TF-IDF, and Multinomial NB classification. The program also predicts the category of the given text (see summarizer.py for more information).

Quickstart ⏩

Using your preferred shell and the Git CLI, the steps are as follows:

➊ Create and move to new directory.

mkdir clippy-clone

cd clippy-clone

➋ Clone repo using Git CLI.

gh repo clone jwc524/clippy

Dependencies 📦

Installation ⚙️

To install each dependency, use the following structure:

pip install <package>

However, as mentioned in the dependencies, pymupdf must be installed as such:

pip install pymupdf==1.18.17

Alternatively:

python3 -m pip install -U pymupdf==1.18.17

For help with repository cloning, refer to Quickstart ⏩.

Directories 🗂

pdfs/

The pdfs/ directory contains sample PDFs to use with Clippy.

reader/

The reader/ directory contains the main Python scripts for the program.

future/

The future/ directory contains work-in-progress scripts of upcoming features.

Features 🪴

headings.py

Headings parses the PDF for its headings and uses the document's outlines if they already exist. Primarily functions as a GUI class.

main.py

Main is the bulk of the program, handling the user interface and calls to other functions.

merging.py

Merging handles the PDF merging calls from main.py. Primarily functions as a GUI class.

rotating.py

Rotating handles PDF rotation as controlled by the user. Primarily functions as a GUI class.

summarizer.py

Summarizer parses the PDF and generates a summary using NLP methods. It also generates a number of graphs based on the extracted text.

Future Plans 🔮

Even though this project was created in a limited amount of time, there are some improvements to be made:

Creating a more responsive, fully-featured GUI
Improving the Data Mining Features
Implementing more user-friendly features
Extracting images and data tables for easy access
Google Scholar API + JSTOR Integration

Credits 📜

This project was written by Ryan Truong, Tony Nguyen, and Jonathan Cole.

Warning ⚠

It takes a long time for the application to start up for the first time.
Program will not run correctly without the correct version of PyMuPDF.

_{_{This project was completed in fulfillment of the requirements of CSC 3400 at Belmont University. Special thank you to Dr. Esteban Parra Rodriguez.}}

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
.idea		.idea
future		future
pdfs		pdfs
reader		reader
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clippy 🔎

Table of Contents

About 💡

How It Works 📖

Quickstart ⏩

Dependencies 📦

Installation ⚙️

Directories 🗂

pdfs/

reader/

future/

Features 🪴

headings.py

main.py

merging.py

rotating.py

summarizer.py

Future Plans 🔮

Credits 📜

Warning ⚠

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Clippy 🔎

Table of Contents

About 💡

How It Works 📖

Quickstart ⏩

Dependencies 📦

Installation ⚙️

Directories 🗂

pdfs/

reader/

future/

Features 🪴

headings.py

main.py

merging.py

rotating.py

summarizer.py

Future Plans 🔮

Credits 📜

Warning ⚠

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages