<a href="https://colab.research.google.com/github/Alpha-Male-Dennis/4792_Data-Mining-Group-18/blob/main/CSC_4792_Project_G18.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Business Understanding

## Problem Statement

The Journal of Natural and Applied Sciences (JONAS) publishes multidisciplinary research articles across various disciplines, including environmental science, agriculture, mining, engineering, and water resources. Manually categorizing these articles into their respective disciplines based on titles and abstracts is time-consuming and subjective. An automated classification model is needed to accurately assign articles to their relevant disciplines using natural language processing (NLP) and machine learning techniques.

## Business Objectives

The primary business objectives are:

1. Efficient Article Classification – Automate the categorization of journal articles to streamline editorial workflows.
2. Improved Discoverability – Enhance search and retrieval of articles by discipline for researchers and readers.
3. Reduced Manual Effort – Minimize the need for manual tagging by editors, reducing human error and workload.

## Success Criteria
* The model should correctly classify articles into predefined disciplines with high accuracy.
* The solution should be scalable to handle new articles as the journal continues publishing.
* The classification system should be interpretable, allowing editors to verify and adjust categories if needed.

## Data Mining Goals

* Text Preprocessing – Clean and preprocess article titles and abstracts (tokenization, stopword removal, stemming/lemmatization).
* Feature Extraction – Convert text into numerical features using techniques like TF-IDF or word embeddings (Word2Vec, GloVe).
* Model Development – Implement a supervised classification model (e.g., Naïve Bayes, SVM, Random Forest, or Neural Networks) to predict article disciplines.
* Evaluation – Assess model performance using metrics such as accuracy, precision, recall, and F1-score.

## Initial Project Success Criteria

* Accuracy: The model should achieve at least 85% accuracy in classifying articles into the correct discipline.
* Interpretability: The model should provide explainable predictions (e.g., feature importance in decision-making).
* Scalability: The solution should handle new, unseen articles without significant performance degradation.

# 2. Data Understanding

## First mount drive

# New Section

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 2.1 Load Raw Dataset

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv('/content/drive/MyDrive/JONAS dataset/journal_articles_final.csv')
print("Dataset loaded successfully!")

## 2.2 Initial Data Exploration

### Inspection

In [None]:
# Display the first 5 rows
print("First 5 rows:")
display(df.head())

# Dataset shape (number of articles, columns)
print("\nDataset shape:", df.shape)

# Column names and data types
print("\nData types and non-null counts:")
display(df.info())

# Summary statistics for numerical columns (if any)
print("\nSummary statistics:")
display(df.describe(include='all'))

### Discipline Distribution (Bar Chart)

In [None]:
import matplotlib.pyplot as plt

# Plot distribution of disciplines
plt.figure(figsize=(10, 6))
df['discipline'].value_counts().plot(kind='bar', color='skyblue')
plt.title('Distribution of Articles by Discipline')
plt.xlabel('discipline')
plt.ylabel('Number of Articles')
plt.xticks(rotation=45, ha='right')
plt.show()

## 2.3 Text Data Exploration

### Word Frequency Analysis

In [None]:
from collections import Counter
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')

# Combine all abstracts into a single string
all_text = ' '.join(df['abstract'].astype(str)).lower()

# Tokenize and remove stopwords
tokens = [word for word in all_text.split() if word not in stopwords.words('english')]
word_freq = Counter(tokens).most_common(20)

# Plot top 20 frequent words
plt.figure(figsize=(10, 6))
pd.DataFrame(word_freq, columns=['Word', 'Frequency']).plot(x='Word', y='Frequency', kind='bar', color='teal')
plt.title('Top 20 Frequent Words (Excluding Stopwords)')
plt.show()

##2.4 Data Quality Verification



### Missing Values Check

In [None]:
print("Missing values per column:")
display(df.isnull().sum())

### Duplicates Check

In [None]:
print("Number of duplicate articles:", df.duplicated(subset=['title', 'abstract']).sum())

## 2.5 Initial Findings Summary

### Data Loading and Description:

The dataset, journal_articles_final.csv, was successfully loaded into a pandas DataFrame named df.
The dataset contains 47 articles and 7 columns: article_id, title, abstract, keywords, discipline, year, and volume.
The columns have the following data types: article_id, title, abstract, keywords, and discipline are objects (strings), while year and volume are integers.

### Data Exploration:

The distribution of articles by discipline shows that "Engineering" has the highest number of articles, followed by "Environmental Science" and "Agriculture".
The top 20 most frequent words in the abstracts (excluding stopwords) include terms like "study", "data", "water", "mining", and "construction", which align with the prominent disciplines in the dataset.

### Data Quality Verification:

There are missing values in the abstract (2 missing) and keywords (5 missing) columns.
There are no duplicate articles based on the combination of 'title' and 'abstract'

# 3. Data Preparation

## 3.1 Setup and Loading

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.feature_extraction.text import CountVectorizer
import nltk
import string

# Download necessary NLTK data for text preprocessing
nltk.download('punkt_tab')
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

# Mount Google Drive to access our datasets
from google.colab import drive
drive.mount('/content/drive')

# --- Load the Dataset ---
# !!! IMPORTANT: Change this path to the location of the file in your Google Drive !!!
var_jonas_dataframe = pd.read_csv('/content/drive/MyDrive/JONAS dataset/journal_articles_final.csv')

print("Setup complete. Raw student performance dataset loaded.")
# We will work with a copy to keep the original raw data intact
var_working_df = var_jonas_dataframe.copy()

## 3.2 Data Selection


In [None]:
# Define the list of columns we believe are most relevant for our initial model
var_relevant_columns = [
    'article_id', 'title', 'abstract', 'keywords', 'discipline'
]

# Select only these columns from the working DataFrame
var_selected_dataframe = var_jonas_dataframe[var_relevant_columns]

print("--- Data Selection Complete ---")
print(f"Original number of columns: {len(var_jonas_dataframe.columns)}")
print(f"Number of columns after selection: {len(var_selected_dataframe.columns)}")
var_selected_dataframe.head(2).T

## 3.3 Data Preprocessing

### 3.3.1 Handling Missing Values (General)


In [None]:
# Step 1: Identify missing values in our selected data
print("--- Missing Values Before Preprocessing ---")
var_missing_counts = var_selected_dataframe.isnull().sum()
print(var_missing_counts[var_missing_counts > 0]) # Show only columns with missing data

# Step 2: Define and apply an imputation strategy
# For text columns (title, abstract, keywords), we'll fill missing values with empty strings
var_text_cols = ['title', 'abstract', 'keywords']

for var_col in var_text_cols:
    var_selected_dataframe[var_col].fillna('', inplace=True)

# For the discipline column (our target variable), drop rows with missing values
# because we cannot classify articles without knowing their discipline.
var_selected_dataframe.dropna(subset=['discipline'], inplace=True)

# Step 3: Verify that the missing values have been handled
print("\n--- Missing Values After Preprocessing ---")
var_missing_after = var_selected_dataframe.isnull().sum()
print(var_missing_after[var_missing_after > 0])

# Step 4: Additional preprocessing for text data
# Clean text data by removing extra whitespace and ensuring consistent formatting
var_selected_dataframe['title'] = var_selected_dataframe['title'].str.strip()
var_selected_dataframe['abstract'] = var_selected_dataframe['abstract'].str.strip()
var_selected_dataframe['keywords'] = var_selected_dataframe['keywords'].str.strip()

# Step 5: Check the final shape of the dataset
print(f"\n--- Final Dataset Shape ---")
print(f"Rows: {var_selected_dataframe.shape[0]}, Columns: {var_selected_dataframe.shape[1]}")

# Display a sample of the preprocessed data
print("\n--- Sample of Preprocessed Data ---")
print(var_selected_dataframe.head(3))

### 3.3.2 Handling Duplicate Values

In [None]:
var_num_duplicates_before = var_selected_dataframe.duplicated(subset=['title', 'abstract']).sum()
print(f"Number of duplicate articles found: {var_num_duplicates_before}")

var_preprocessed_dataframe = var_selected_dataframe.drop_duplicates(subset=['title', 'abstract'], keep='first')
print(f"Shape after removing duplicates: {var_preprocessed_dataframe.shape}")

### 3.3.3 Text Pre-processing

In [None]:
def fxn_convert_to_lowercase(var_text):
    if isinstance(var_text, str):
        return var_text.lower()
    return ""

def fxn_remove_punctuation(var_text):
    if isinstance(var_text, str):
        return "".join([var_char for var_char in var_text if var_char not in string.punctuation])
    return ""

def fxn_remove_stopwords(var_text):
    if isinstance(var_text, str):
        var_tokens = word_tokenize(var_text)
        var_stop_words = set(stopwords.words('english'))
        var_filtered_tokens = [var_word for var_word in var_tokens if var_word not in var_stop_words]
        return " ".join(var_filtered_tokens)
    return ""

def fxn_stem_text(var_text):
    if isinstance(var_text, str):
        var_tokens = word_tokenize(var_text)
        var_stemmer = PorterStemmer()
        var_stemmed_tokens = [var_stemmer.stem(var_word) for var_word in var_tokens]
        return " ".join(var_stemmed_tokens)
    return ""


def fxn_preprocess_text_pipeline(var_text):
    if not isinstance(var_text, str):
        return ""
    var_processed_text = fxn_convert_to_lowercase(var_text)
    var_processed_text = fxn_remove_punctuation(var_processed_text)
    var_processed_text = fxn_remove_stopwords(var_processed_text)
    var_processed_text = fxn_stem_text(var_processed_text)
    return var_processed_text

# Apply the preprocessing pipeline to the relevant text columns
var_preprocessed_dataframe['CleanedTitle'] = var_preprocessed_dataframe['title'].apply(fxn_preprocess_text_pipeline)
var_preprocessed_dataframe['CleanedAbstract'] = var_preprocessed_dataframe['abstract'].apply(fxn_preprocess_text_pipeline)
var_preprocessed_dataframe['CleanedKeywords'] = var_preprocessed_dataframe['keywords'].apply(fxn_preprocess_text_pipeline)

print("--- Text Pre-processing Complete ---")
# Display the original and cleaned columns for comparison
display(var_preprocessed_dataframe[['title', 'CleanedTitle', 'abstract', 'CleanedAbstract', 'keywords', 'CleanedKeywords']].head())

### 3.3.4 Code See The Transformation

In [None]:
# Select a sample motivation to process (find a non-empty one)
var_sample_text = var_preprocessed_dataframe['abstract'].dropna().iloc[0]

print(f"--- 1. ORIGINAL TEXT ---\n'{var_sample_text}'\n")

# Apply step 1: Lowercasing
var_lowercase_text = fxn_convert_to_lowercase(var_sample_text)
print(f"--- 2. AFTER LOWERCASE ---\n'{var_lowercase_text}'\n")

# Apply step 2: Remove Punctuation
var_no_punct_text = fxn_remove_punctuation(var_lowercase_text)
print(f"--- 3. AFTER REMOVING PUNCTUATION ---\n'{var_no_punct_text}'\n")

# Apply step 3: Remove Stopwords
var_no_stopwords_text = fxn_remove_stopwords(var_no_punct_text)
print(f"--- 4. AFTER REMOVING STOPWORDS ---\n'{var_no_stopwords_text}'\n")

# Apply step 4: Stemming
var_stemmed_text = fxn_stem_text(var_no_stopwords_text)
print(f"--- 5. FINAL STEMMED TEXT ---\n'{var_stemmed_text}'\n")

## 3.4.1 Transforming Categorical Data (Encoding)



This will convert the categorical discipline names into a numerical format

In [None]:
# Perform One-Hot Encoding on 'discipline'
var_dummies_dataframe = pd.get_dummies(var_preprocessed_dataframe['discipline'], prefix='discipline')

# Join the new dummy columns back to our main DataFrame
var_transformed_dataframe = pd.concat([var_preprocessed_dataframe, var_dummies_dataframe], axis=1)

print("--- DataFrame after One-Hot Encoding ---")
# Display the original column and the new binary columns
display(var_transformed_dataframe[['discipline'] + [col for col in var_transformed_dataframe.columns if col.startswith('discipline_')]].head())

## 3.4.2 Transforming Text Data (Bag-of-Words)

This section transforms the cleaned text data into numerical features using the Bag-of-Words model.

In [None]:
# Feature Engineering
# Combine the cleaned text columns into a single column
var_preprocessed_dataframe['CombinedText'] = var_preprocessed_dataframe['CleanedTitle'] + ' ' + var_preprocessed_dataframe['CleanedAbstract'] + ' ' + var_preprocessed_dataframe['CleanedKeywords']


# --- Complete Text Preprocessing Pipeline ---
def fxn_preprocess_text_pipeline(var_text):
    """A complete text preprocessing pipeline for journal articles."""
    if not isinstance(var_text, str):
        return ""  # Return empty string for non-string (e.g., NaN) inputs

    var_processed_text = fxn_convert_to_lowercase(var_text)
    var_processed_text = fxn_remove_punctuation(var_processed_text)
    var_processed_text = fxn_remove_stopwords(var_processed_text)
    var_processed_text = fxn_stem_text(var_processed_text)

    return var_processed_text

# --- Apply Preprocessing to Journal Article Abstracts ---

# Step 1: Handle missing values in the abstract column
var_abstract_series = df['abstract'].fillna('')

# Step 2: Apply the full cleaning pipeline to the abstracts
var_cleaned_abstracts = var_abstract_series.apply(fxn_preprocess_text_pipeline)

# Step 3: Transform the cleaned text into a Bag-of-Words representation
var_vectorizer = CountVectorizer(max_features=1000)  # Limit to top 1000 features
var_bow_matrix = var_vectorizer.fit_transform(var_preprocessed_dataframe['CombinedText'])

# Step 4: Convert the result into a DataFrame for easy viewing
var_bow_dataframe = pd.DataFrame(var_bow_matrix.toarray(),
                                 columns=var_vectorizer.get_feature_names_out())

print("--- Bag-of-Words Transformation for Journal Abstracts ---")
print("Original text data shape:", var_preprocessed_dataframe['CombinedText'].shape)
print("Transformed BoW data shape:", var_bow_dataframe.shape)
print("\nSample of BoW DataFrame (showing word counts per article):")
display(var_bow_dataframe.head())

## Conclusion

In this notebook, we have successfully completed the data preparation phase for classifying journal articles by discipline. We started by loading and exploring the dataset to understand its structure and identify initial data quality issues.

The key steps undertaken in this phase were:

1.  **Data Cleaning:** We handled missing values in the 'keywords' column by filling them with empty strings and verified that there were no duplicate articles based on title and abstract.
2.  **Feature Engineering:** We created a combined text feature by concatenating the cleaned 'title', 'abstract', and 'keywords' to provide a comprehensive text representation for each article.
3.  **Data Transformation:** We converted the categorical 'discipline' labels into a numerical format using One-Hot Encoding and transformed the combined text data into a numerical Bag-of-Words representation.

The data is now cleaned, features have been engineered, and the data is transformed into a format suitable for input into machine learning classification models. The next steps will involve selecting, training, and evaluating different models to determine the best approach for automated article classification.

#4. Modelling

## Algorithm Selection
For this academic paper classification project, we selected three different classification algorithms to compare their performance:
1. Random Forest Classifier
Why chosen: Excellent for handling high-dimensional data like text embeddings, provides feature importance insights, and is robust against overfitting through its ensemble approach.

2. Support Vector Machine (SVM)
Why chosen: Particularly effective in high-dimensional spaces (like our 384-dimensional embeddings), works well with linear decision boundaries, and handles complex classification tasks.

3. Logistic Regression
Why chosen: A strong baseline classifier that works well with dense numerical features like embeddings, provides probability estimates, and is computationally efficient.



## Data Preparation and Splitting

### Dataset Characteristics:
1. Total samples: 46 research papers

2. Number of classes: 14 different academic disciplines

Class distribution: Highly imbalanced (Mining Engineering: 14 samples, some disciplines: only 1 sample)


### Splitting Strategy:
1. Test size: 25% of data (12 samples)

2. Training size: 75% of data (34 samples)

3. Challenge: Attempted stratified splitting to maintain class proportions, but this failed due to classes with only 1 member

#### Solution: Used non-stratified random splitting with random_state=42 for reproducibility


### Feature Engineering:

1. Used Sentence-BERT embeddings ('all-MiniLM-L6-v2') to convert text to 384-dimensional numerical vectors

2. Combined title and abstract for comprehensive document representation

3. Encoded target labels using LabelEncoder


### Initial Results

####The LOOCV accuracy scores on the training set (34 samples) were:

* Random Forest: 14.7% accuracy

* SVM: 26.5% accuracy

* Logistic Regression: 32.4% accuracy

####Observations:

* The low scores are expected due to the small dataset, high dimensionality (384 features), many classes (14) with severe imbalance, and the complexity of text classification.

* Logistic Regression performed best, which is typical for limited data because it estimates fewer parameters than more complex models.

###Modelling code


In [None]:
# ===== MODELING =====

print("="*50)
print("MODELING PHASE")
print("="*50)

# Import additional libraries needed for modeling and embeddings
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import cross_val_score, StratifiedKFold, train_test_split
from sklearn.preprocessing import LabelEncoder
import numpy as np
# Removed TfidfVectorizer as we will use embeddings
# from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Import libraries for Sentence Embeddings
!pip install sentence-transformers
from sentence_transformers import SentenceTransformer
import torch # Import torch to check for GPU availability


# 1. Select appropriate data mining algorithms
"""
We'll use three classification algorithms:
1. Random Forest - Excellent for text classification, handles high dimensionality well,
   provides feature importance, and is robust to overfitting. Works with dense embeddings.
2. Support Vector Machine (SVM) - Effective in high-dimensional spaces like text data.
   Works very well with dense embeddings.
3. Multinomial Naive Bayes - Traditionally strong for text classification tasks,
   efficient with large feature sets. Note: MultinomialNB expects non-negative integer features (like counts),
   so it's generally not suitable for dense embeddings. We will replace it or use a different variant.
   Let's replace MultinomialNB with a different model that handles dense data, like a simple Logistic Regression.
"""
from sklearn.linear_model import LogisticRegression


# Prepare the data for modeling
print("Preparing data for modeling...")

# Use the preprocessed data
var_model_df = var_preprocessed_dataframe.copy()

# Check if we have the cleaned text columns
if 'CleanedTitle' not in var_model_df.columns or 'CleanedAbstract' not in var_model_df.columns:
    print("Error: Text preprocessing not completed. Please run the text preprocessing section first.")
else:
    # Combine text features
    # Using 'enhanced_text' which is less aggressively processed might be better for embeddings
    # var_model_df['CombinedText'] = var_model_df['CleanedTitle'] + " " + var_model_df['CleanedAbstract'] + " " + var_model_df['CleanedKeywords']
    # Let's use the original title and abstract for embeddings, as they are designed for more natural text
    var_model_df['CombinedText'] = var_model_df['title'] + " " + var_model_df['abstract']


    # Analyze class distribution (for information, not filtering)
    class_distribution = var_model_df['discipline'].value_counts()
    print("Class distribution (using all data):")
    display(class_distribution)

    # --- Using all 46 samples ---
    # using all available data

    # Encode the target variable
    label_encoder = LabelEncoder()
    var_y_encoded = label_encoder.fit_transform(var_model_df['discipline'])
    class_names = label_encoder.classes_

    print(f"\nNumber of classes: {len(class_names)}")
    print("Classes:", class_names)
    print(f"Total samples: {len(var_model_df)}") # Now this will be 46


    # 2. Generate Sentence Embeddings
    print("\nGenerating Sentence Embeddings...")

    # Choose a pre-trained Sentence-BERT model
    # 'all-MiniLM-L6-v2' is a good balance of size and performance
    # Check for GPU availability
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(f"Using device: {device}")
    embedding_model = SentenceTransformer('all-MiniLM-L6-v2', device=device)

    # Generate embeddings for the combined text
    # Process in batches if necessary for large datasets, but 46 is small
    var_X = embedding_model.encode(var_model_df['CombinedText'].tolist(), show_progress_bar=True)

    print(f"Sentence embeddings generated. Shape: {var_X.shape}")


    # Split data into training and testing sets
    print("\nSplitting data into training and testing sets...")

    # Adjust test size or splitting strategy for the full, imbalanced dataset
    # Given the very small size (46) and many classes, stratified split is preferred if possible.
    test_size = 0.25 # Adjusted test size

    try:
        # Use Stratified train_test_split if possible
        var_X_train, var_X_test, var_y_train, var_y_test = train_test_split(
            var_X, var_y_encoded, test_size=test_size, random_state=42, stratify=var_y_encoded
        )
        print(f"Using Stratified train_test_split with test size: {test_size:.2f}")

    except ValueError as e:
         print(f"Could not perform stratified split ({e}). Trying non-stratified split.")
         # Fallback to non-stratified if stratified fails (due to very small class sizes in test split)
         var_X_train, var_X_test, var_y_train, var_y_test = train_test_split(
             var_X, var_y_encoded, test_size=test_size, random_state=42, stratify=None # Explicitly set stratify=None
         )
         print(f"Using Non-Stratified train_test_split with test size: {test_size:.2f}")


    print(f"Training set size: {var_X_train.shape[0]}")
    print(f"Testing set size: {var_X_test.shape[0]}")
    print(f"Number of features (embedding dimensions): {var_X_train.shape[1]}")


    # 3. Train the chosen models
    print("\nTraining models...")

    # Initialize the models suitable for dense embeddings and imbalance
    models = {
        'Random Forest': RandomForestClassifier(
            n_estimators=50,
            random_state=42,
            class_weight='balanced_subsample', # Use balanced_subsample for better handling with small samples
            max_depth=5 # Limit depth to prevent overfitting
        ),
        'SVM': SVC(
            kernel='linear',
            random_state=42,
            probability=True,
            C=1.0,
            class_weight='balanced'
        ),
        'Logistic Regression': LogisticRegression( # Replaced Naive Bayes
             random_state=42,
             max_iter=1000, # Increase max_iter for convergence
             class_weight='balanced'
        )
    }

    # Train each model and store results
    trained_models = {}
    training_results = {}

    # Use appropriate cross-validation strategy
    # With very small and imbalanced classes, LOOCV might be more informative
    from sklearn.model_selection import LeaveOneOut
    cv = LeaveOneOut()
    print("\nUsing Leave-One-Out cross-validation for robust evaluation on small dataset.")


    for name, model in models.items():
        print(f"Training {name}...")
        model.fit(var_X_train, var_y_train) # Train on the training split

        trained_models[name] = model

        # Cross-validation to assess training performance using LOOCV
        try:
            # Use the training data for LOOCV
            cv_scores = cross_val_score(model, var_X_train, var_y_train, cv=cv, scoring='accuracy')
            training_results[name] = {
                'cv_mean_accuracy': cv_scores.mean(),
                'cv_std_accuracy': cv_scores.std(),
                'cv_scores': cv_scores
            }
            print(f"  {name} LOOCV Accuracy (on Training Set): {cv_scores.mean():.3f}") # LOOCV std is not meaningful
        except Exception as e:
            print(f"  Could not perform cross-validation for {name}: {e}")
            training_results[name] = {
                'cv_mean_accuracy': np.nan,
                'cv_std_accuracy': np.nan,
                'cv_scores': []
            }


    print("\nModel training completed successfully!")

## Save the Models

In [None]:
import joblib
import os

# Create output directory if it doesn't exist
output_dir = "/content/drive/MyDrive/JONAS dataset/"
os.makedirs(output_dir, exist_ok=True)

# Save models and encoder
print("\nSaving models and encoder to Google Drive...")

# Save the label encoder (needed to decode predictions)
joblib.dump(label_encoder, os.path.join(output_dir, "label_encoder_final_g18.pkl"))

# Save each trained model
for name, model in trained_models.items():
    filename = os.path.join(output_dir, f"{name.replace(' ', '_').lower()}_model.pkl")
    joblib.dump(model, filename)
    print(f"  Saved {name} model to {filename}")

print("\nAll models and encoder saved successfully!")
