Skip to content

3bdullahCS/Tourlyze

Repository files navigation

🏗️ Tourlyze — Smart Tourism Sentiment Analyzer

📌 Overview

An intelligent system that analyzes tourist sentiment toward Saudi cities by scraping reviews from Google Maps and analyzing them using an Arabic AI model (CAMeLBERT).

🔄 System Workflow (End-to-End)

User types "AlUla" → React sends request → C# creates report → C# sends to Python
→ Python scrapes reviews from Google Maps → Python analyzes sentiment with CAMeLBERT
→ Python returns results → C# saves to database → React displays charts

📁 File Structure


🔷 Section 1: Frontend (React + TypeScript)

1️⃣ index.html — Base HTML page

  • First file loaded by the browser
  • Loads Tajawal (Arabic) and Inter (English) fonts from Google Fonts
  • Loads TailwindCSS for styling
  • Defines project colors: dark blue (primary), green (secondary), gold (accent)
  • Contains <div id="root"> where React renders everything

2️⃣ index.tsx — React entry point

  • Connects the App component to the root HTML element
  • Uses React.StrictMode for error detection during development
  • Rarely modified

3️⃣ index.css — Global styles

  • Sets page direction to Right-to-Left (RTL)
  • Contains custom scrollbar styling

4️⃣ App.tsx ⭐ — Main frontend file (most important!)

A large file containing all components and pages:

Components:

  • Button — Unified button with 4 styles (primary, secondary, outline, ghost)
  • Input — Input field with icon
  • Card — White card with soft shadow
  • Navbar — Top navigation bar (logo + username + language toggle + logout)
  • Sidebar — Side menu (Dashboard, New Analysis, Reports, About)
  • DashboardLayout — General layout (Navbar + Sidebar + Content)

Charts:

  • SentimentChart — Pie chart for Positive/Negative/Neutral percentages
  • FrequencyChart — Bar chart for top 5 most frequent words
  • WordCloud — Word cloud with varying sizes and colors

Pages:

Route Page
/ Login
/signup Sign Up
/forgot-password Forgot Password
/city-input City Input
/dashboard Results Dashboard
/report Report
/about About

5️⃣ src/types.ts — Data type definitions

  • Sentiment — Sentiment classification (Positive, Negative, Neutral)
  • Review — Review shape (text, source, date, author, rating)
  • User — User data (name, email, role)
  • AnalysisStats — Statistics (positive, negative, neutral counts)
  • WordFreq — Word + occurrence count
  • CityAnalysisData — Complete city analysis data

6️⃣ src/context/AuthContext.tsx — Authentication management

  • Stores JWT Token in localStorage
  • Auto-checks if user is logged in on page load
  • Provides: login(), register(), resetPassword(), logout()
  • Accessed via useAuth() hook

7️⃣ src/context/LanguageContext.tsx — Language management (Arabic/English)

  • Saves selected language in localStorage
  • Automatically switches page direction (RTL ↔ LTR)
  • Provides t('key') function for translated text
  • Accessed via useLanguage() hook

8️⃣ src/components/LanguageSwitcher.tsx — Language toggle button

  • Shows "English" if current language is Arabic, and "عربي" if English
  • Calls toggleLanguage() on click

9️⃣ src/translations/ar.ts — Arabic text strings

  • Dictionary with 100+ Arabic UI text entries

🔟 src/translations/en.ts — English text strings

  • Same dictionary in English

1️⃣1️⃣ src/services/reportApi.ts — Report API service

  • generateReport() — Sends city name to backend to start analysis
  • getReports() — Fetches list of previous reports
  • getLatestReport() — Fetches the most recent report
  • getReportById() — Fetches a specific report by ID
  • All requests include JWT Token for authentication

🔷 Section 2: Backend (C# .NET 8)

1️⃣2️⃣ SmartTourism.API/Program.cs — Server entry point

  • Connects SQLite database
  • Configures CORS (allows frontend from localhost:3000)
  • Sets up JWT authentication (validates token on every request)
  • Enables Swagger for API documentation

1️⃣3️⃣ SmartTourism.API/appsettings.json — Server configuration

  • Database path: SmartTourism.db
  • JWT secret encryption key
  • Issuer and Audience settings for token

1️⃣4️⃣ SmartTourism.API/SmartTourism.API.csproj — Project definition

  • .NET 8
  • Libraries: BCrypt (password hashing), JWT Bearer (security), SQLite (database), Swagger (docs)

1️⃣5️⃣ Controllers/AuthController.cs ⭐ — Authentication controller

Endpoint Function
POST /api/auth/register Register new account (hashes password with BCrypt)
POST /api/auth/login Login (verifies password, issues JWT Token)
POST /api/auth/reset-password Reset password
GET /api/auth/me Get current user data (protected with [Authorize])

1️⃣6️⃣ Controllers/ReportsController.cs ⭐⭐ — Reports controller (most important backend file!)

Endpoint Function
POST /api/reports/generate Receives city name → sends to Python → saves results → returns them
GET /api/reports Fetches all user reports
GET /api/reports/latest Fetches latest completed report
GET /api/reports/{id} Fetches report by ID

Smart features:

  • Uses SHA256 to prevent duplicate reports (returns existing report if same city + settings)
  • Uses SHA256 for review deduplication
  • Saves reviews in batches for better performance

1️⃣7️⃣ Models/User.cs — Users table Id (GUID) | FirstName | LastName | Email | PasswordHash | Role | CreatedAt

1️⃣8️⃣ Models/Report.cs — Reports table Id | UserId | City | Sources | Status (Processing/Completed/Failed) | TotalReviews | PositiveCount | NegativeCount | NeutralCount | ReportJson

1️⃣9️⃣ Models/Review.cs — Individual reviews table Id | ReportId | Source | ReviewText | PredictedLabel | Score (0-1 confidence) | KeywordsJson | ReviewHash

2️⃣0️⃣ Data/AppDbContext.cs — Database context

  • Defines 3 tables: Users, Reports, Reviews
  • Unique index on Email (no duplicates)
  • Index on ReportKey to prevent duplicate reports

2️⃣1️⃣ DTOs/AuthDtos.cs — Authentication data transfer objects

  • RegisterDto — Registration data (name + email + password)
  • LoginDto — Login data (email + password)
  • ResetPasswordDto — Password reset data
  • UserDto — User data sent to frontend (no password for security)

2️⃣2️⃣ DTOs/ReportDtos.cs — Report data transfer objects

  • GenerateReportDto — Report creation request (city + sources + date)
  • ReportSummaryDto — Report summary for list view
  • ReportDetailDto — Full report details with JSON

2️⃣3️⃣ Migrations/ — Database migrations (schema evolution)

  • InitialCreate — Created Users table
  • AddPasswordResetToken — Added password reset field
  • AddReportsAndReviews — Created Reports and Reviews tables

🔷 Section 3: AI Service (Python FastAPI)

2️⃣4️⃣ SmartTourism.ML/main.py ⭐⭐⭐ — AI server (most important file in the project!)

Part 1: Arabic Text Cleaning

  • Removes URLs, mentions (@), and hashtags (#)
  • Removes diacritics (Tashkeel) and Tatweel (ـــ)
  • Normalizes Hamza variations: أ/إ/آ → ا
  • Detects and ignores commercial ads (phone numbers + promotional keywords)

Part 2: Review Scraping

  • Opens Chrome in headless mode using Selenium
  • Searches for the location in Google Maps
  • Automatically clicks the "Reviews" tab
  • Scrolls down and collects reviews (text + star rating)
  • Avoids detection with anti-bot techniques

Part 3: AI Sentiment Analysis

  • Loads CAMeLBERT model (Arabic dialect-specific) at startup
  • Passes each review through the model → returns: label (Positive/Negative/Neutral) + confidence score
  • Extracts keywords from each review

Part 4: API

  • POST /api/analyze — Receives location name → scrapes → cleans → analyzes → returns results

2️⃣5️⃣ SmartTourism.ML/requirements.txt — Python dependencies

  • fastapi + uvicorn — Fast web server
  • selenium + webdriver-manager — Browser-based scraping
  • torch + transformers — AI libraries (PyTorch + HuggingFace)
  • pandas — Data processing

🔷 Section 4: Build Output (dist)

  • dist/index.html — Final built HTML page
  • dist/assets/index-*.js — All React code bundled into one file (675KB)
  • dist/assets/index-*.css — Minified styles

dist = the final production-ready version. Generated by running npm run build.


🔷 Section 5: Configuration Files

File Purpose
package.json Frontend project definition and dependencies
vite.config.ts Vite build tool settings (port 3000)
tsconfig.json TypeScript configuration
.gitignore Files excluded from Git
SmartTourism.sln C# solution file

🛠️ Tech Stack

Technology Usage
React + TypeScript User Interface
TailwindCSS Styling
Recharts + D3.js Charts & Visualizations
C# .NET 8 Backend API
Entity Framework Database ORM
SQLite Database
JWT + BCrypt Security & Encryption
Python FastAPI AI Service Server
Selenium Web Scraping
CAMeLBERT Arabic Sentiment Analysis Model
HuggingFace Transformers AI Model Runtime

🚀 Running the Project

# 1. Start the AI server (Python)
cd SmartTourism.ML
venv/bin/python main.py          # Runs on http://localhost:8000

# 2. Start the backend (C#)
cd SmartTourism.API
dotnet run                       # Runs on http://localhost:5165

# 3. Start the frontend (React)
npm run dev                      # Runs on http://localhost:3000

❓ Expected Discussion Questions

Q1: Why split the backend into two services (C# and Python)? We used a Microservices architecture. C# handles the core API, data management, and security because it's fast and robust. Python is used because it's the strongest language for running AI models (HuggingFace/CAMeLBERT). This separation makes the system scalable.

Q2: How do you scrape data from Google Maps? We use Selenium to open Chrome in headless mode, search for the location on Google Maps, click the reviews tab, and scroll automatically to collect reviews. We avoid detection by masking the bot's fingerprint.

Q3: What AI model is used and why? We use CAMeLBERT-da-sentiment from the CAMeL Lab at NYU Abu Dhabi. We chose it because it's specifically trained on Arabic dialects (not just Modern Standard Arabic), and tourists typically write in colloquial Arabic.

Q4: How are passwords secured? We never store plain passwords. We use BCrypt for Hashing + Salting. JWT Tokens are used for sessions without storing session data on the server.

Q5: Why TypeScript instead of JavaScript? TypeScript catches errors at write-time (before runtime) by enforcing type definitions. This reduces bugs and speeds up development.

Q6: What are DTOs and why use them? Data Transfer Objects define exactly what data is sent to the client, without exposing all database columns (e.g., we send user data without the password hash).

Q7: How do you prevent duplicate reports and reviews? We use SHA256 hashing. Each report gets a unique key derived from (userId + city + sources + date). If the same request is made again, we return the existing report instead of re-analyzing.

Q8: What happens if the Python server fails? C# wraps the request in a try-catch. If it fails, the report status is set to "Failed" and a 500 error is returned to the user without crashing the entire system.

Q9: What's the difference between the src and dist folders? src = development files (source code we write). dist = the final built and minified version ready to deploy to the internet.

Q10: How does the site support both Arabic and English? We use React Context API. LanguageContext stores the current language and switches page direction (RTL/LTR). All text lives in separate translation files (ar.ts and en.ts) and is accessed via the t('key') function.


Supervisor: Dr. Mufreh Al-Qahtani

Team: Ali Abdullah Al-Mastur • Abdullah Hussein Al-Awad • Abdullah Mosfer • Yazan Yahya • Mahdi Hamoud Al-Dosari • Abdulrahman Adawi

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors