SemanticApp is a web application that computes the semantic similarity between two text documents. Semantic similarity measures how much two texts discuss the same topics, considering the meaning behind the words rather than just their presence. The application utilizes state-of-the-art natural language processing techniques and models to provide accurate results.
Semantic similarity refers to the degree of likeness between two pieces of text in terms of their meaning. It goes beyond simple word matching and considers the context and understanding of the content. In the context of SemanticApp, the application calculates the semantic similarity between two uploaded text documents.
The purpose of SemanticApp is to provide users with a tool for assessing how closely related two pieces of text are in terms of content and meaning. This can be useful in various applications such as document comparison, plagiarism detection, and content recommendation.
SemanticApp utilizes the following technologies:
- Cosine Similarity: A measure of similarity between two non-zero vectors.
- DistilBERT: A pretrained transformer model for natural language understanding.
- Streamlit: A Python library for creating interactive web applications.
- Python: The programming language used for building the application.
- NumPy: A library for numerical operations in Python.
- Vector Embeddings: Representations of text in a high-dimensional space used for semantic analysis.
-
Upload Documents: Users upload two text documents (in PDF or DOCX format) through the web interface.
-
Document Processing: The application reads the content of the documents using specialized functions for DOCX and PDF formats.
-
Semantic Embeddings: The text content is converted into vector embeddings using the DistilBERT model. These embeddings capture the semantic meaning of the text.
-
Cosine Similarity: Cosine similarity is calculated between the normalized embeddings of the two documents. This yields a semantic similarity score.
-
User Feedback: The application displays the content of the documents and the computed similarity score. The score is color-coded based on its magnitude, providing an intuitive understanding of the relationship between the texts.
SemanticApp is developed and maintained by Ajbar Alae in February 2024. Feel free to reach out for questions, feedback, or contributions at alae1ajbar@gmail.com