<a href="https://colab.research.google.com/github/Eaby/NLP_Codes/blob/main/NU_IUI_Text_Analytics_Visualisation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Text Analytics & Visualisation***

********************************************************************************

# **Task 1 : Word Cloud**

Text visualization in Natural Language Processing (NLP) refers to the use of graphical or visual representations to depict and convey information extracted from textual data. It serves as a powerful tool for making sense of large volumes of text, revealing patterns, relationships, and insights that might be challenging to discern through traditional text-based analysis alone. Text visualization techniques help researchers, analysts, and decision-makers gain a more intuitive and comprehensive understanding of textual data.

We are going to use WordCloud library to generate a word cloud visualization from a given text in the code.

Libraries Used:
**'numpy'** is imported as **'np'**. Although not used in this specific code, it's a common practice to import **'numpy'** as **'np'** for mathematical operations.
**matplotlib.pyplot** is imported as **'plt'** to create and display the word cloud visualization.
**WordCloud** and **STOPWORDS** are imported from the **'wordcloud'** library. **WordCloud** is the primary class for generating word clouds, and **STOPWORDS** contains a set of common English words that are often excluded from word clouds because they don't convey significant meaning.


**generate_wordcloud** is a function that takes one argument, **'text'**, which is the input text from which the word cloud will be generated. It first creates a set of stopwords using the **STOPWORDS** set from the **wordcloud** library. These stopwords are words like "the," "is," "and," etc., which are typically filtered out of word clouds. Then, it initializes a **WordCloud** object with the following parameters:


> **background_color='white'**: Sets the background color of the word cloud to white.

>**stopwords=stopwords**: Specifies the set of stopwords to be used.

>**max_words=200**: Limits the maximum number of words to be displayed in the word cloud to 200.

>**max_font_size=40**: Sets the maximum font size for words in the word cloud to 40.

>**random_state=42**: Provides seed for the random number generator, ensuring reproducibility of the word cloud layout.


The WordCloud object is then used to generate the word cloud from the input text and is displayed using **matplotlib.pyplot** with a specified figure size, and the imshow function is used to show the word cloud. **plt.axis('off')** removes the axis labels and ticks from the plot, and **plt.show()** displays the plot

sample_text is the variable containing the sample text from which the word cloud will be generated. It is a multi-line string containing a description. You can replace this static text input with real-time text sources or other text files with large content.

When we run this code, it will generate a word cloud based on the words in the sample_text and display it as a graphical representation where the most frequent words will be larger and more prominent in the visualization.

In [None]:
pip install wordcloud matplotlib

In [None]:
#************************************************************************************************
# Developer: Eaby Kollonoor Babu
# Version: 2.2
# Last Updated: 2023-08-19
# Contact:eaby.asha@gmail.com

# Description
"""
Text Analytics & Visualisation using Python(for educational purpose only)

It’s a set of programs to do perform Text Analysis and visualization tasks like,
Word Cloud, Heatmaps and Geospatial Visualization with and without NLP.

"""

# License and Copyright Notice
"""
Copyright (c) 2023 Eaby Kollonoor Babu

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), for the sole
purpose of educational and non-commercial use, without restriction, including
without limitation the rights to use, copy, modify, merge, publish, distribute,
or sublicense copies of the Software, and to permit persons to whom the Software
is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS" FOR EDUCATIONAL USE ONLY, WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.
"""

# Changelog/Release Notes
"""
Changelog:

- Version 1.0 (2023-08-05): Basic skeleton of the code.
- Version 1.1 (2023-08-06): Added Heatmap to the code.
- Version 2.1 (2023-08-09): Added Geospatial Visualization without NLP to the code.
- Version 2.2 (2023-08-19): Added Geospatial Visualization with NLP to the code.

"""

# Feedback
"""
For questions or feedback, feel free to email me at eaby.asha@gmail.com
"""

#************************************************************************************************

import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS

def generate_wordcloud(text):
    stopwords = set(STOPWORDS)
    wordcloud = WordCloud(
        background_color='white',
        stopwords=stopwords,
        max_words=200,
        max_font_size=40,
        random_state=42
    ).generate(text)

    plt.figure(figsize=(10, 8))
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis('off')
    plt.show()

sample_text = """
    Natural language processing is a sub-field of artificial intelligence,
    in which its depth involves the interactions between computers and humans.
    Through NLP, it is possible for computers to read text, interpret it,
    measure sentiment and determine which parts are important.
"""

generate_wordcloud(sample_text)


**Note:** You can customize the appearance and behavior of the word cloud using various parameters in the WordCloud class. For instance, background_color sets the background color, max_words sets the maximum number of words to display, and max_font_size sets the largest font size in the word cloud.

Note: If you want to use a custom shape or mask for your word cloud, you can utilize the mask parameter of WordCloud and pass a binary image (with the shape you want as white and the rest as black). This will restrict the word cloud to the shape of the white region in the image.


********************************************************************************

# **Task 2 : Heatmaps**

Heatmaps are a common data visualization technique used in Natural Language Processing (NLP) to represent relationships, patterns, or distributions within textual data. In a heatmap, data values are displayed as a grid of colored squares, with each square's color intensity representing the magnitude of a specific value or the degree of association between two variables. In the context of NLP, heatmaps can be applied in various ways including Word Co-occurrence, Text Similarity, Word Frequency, and Sequence Analysis.

The Python script below performs these 4 activities. The description for the code is explained below.

>**Sample Text Data:**
**text_data**: A list of sample text sentences for analysis. These sentences represent different aspects of natural language processing and machine learning.

>**Tokenization of Text Data:**
**tokenized_data:** Tokenization is the process of splitting text into words or tokens. In this code, each sentence in **text_data** is tokenized into a list of words for further analysis.

>**Function to Generate and Display Heatmaps:**
**generate_heatmap(data, labels, title)**: A function that generates and displays a customized heatmap using Seaborn and Matplotlib. It takes data, labels, and a title as input parameters and displays a heatmap with annotations.

>**Word Frequency Heatmap:**
Calculates the word frequency of all words in the tokenized_data.
Selects the top N most frequent words (defined by top_n_words).
Generates and displays a Word Frequency Heatmap using the generate_heatmap function.

>**Word Co-occurrence Heatmap:**
Calculates the co-occurrence matrix of words in the text_data.
Selects the top N most relevant words for co-occurrence (defined by top_n_words_cooc).
Generates and displays a Word Co-occurrence Heatmap using the generate_heatmap function.

>**Text Similarity Heatmap:**
Uses TF-IDF (Term Frequency-Inverse Document Frequency) to calculate the similarity between text documents.
Generates and displays a Text Similarity Heatmap using the generate_heatmap function.

>**Print Document Indices and Corresponding Text Data:**
Prints the document indices and their corresponding text data to help users understand the numbering used in the Text Similarity Heatmap.

>**Sequence Analysis Heatmap (Word Presence):**
Creates a binary matrix to represent the presence or absence of words in each sentence.
Generates and displays a Sequence Analysis Heatmap (Word Presence) using the generate_heatmap function.

>**Print Legend for Sequence Analysis Heatmap (Word Presence):**
Prints a legend to explain the meaning of 0 and 1 in the Sequence Analysis Heatmap.
Additionally, it provides information about the axes, including the X and Y axes.

In [None]:
#********************************
# Developer: Eaby Kollonoor Babu
# Version: 2.2
# Added: 2023-08-06
# Contact:eaby.asha@gmail.com
#********************************

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from nltk import FreqDist
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Different sample text data
text_data = [
    "Social media platforms connect people from all over the world.",
    "Machine learning algorithms can analyze large datasets efficiently.",
    "Text summarization techniques condense lengthy documents into concise summaries.",
    "Deep learning models have revolutionized image and speech recognition.",
    "Natural language generation can automatically create human-like text.",
]

# Tokenize the text data (for sequence analysis)
tokenized_data = [sentence.split() for sentence in text_data]

# Function to generate and display a customized heatmap
def generate_heatmap(data, labels, title):
    plt.figure(figsize=(8, 6))
    sns.set(font_scale=1.2)
    sns.set_style("whitegrid")  # Customize the plot style
    sns.heatmap(data, annot=True, fmt=".2f", cmap="Blues", linewidths=1, linecolor='black', cbar=False,
                xticklabels=labels, yticklabels=labels, annot_kws={"size": 12})
    plt.title(title)
    plt.xticks(rotation=45, ha="right")  # Rotate x-axis labels for better readability
    plt.yticks(rotation=0)
    plt.tight_layout()
    plt.show()

# Word Frequency Heatmap with top N most frequent words
top_n_words = 5  # Set the number of top words to display
word_freq = FreqDist(np.concatenate(tokenized_data))
most_common_words = word_freq.most_common(top_n_words)  # Get the most frequent words
word_labels = [word[0] for word in most_common_words]  # Extract word labels
word_freq_matrix = np.array([[word_freq[word] for word in word_labels]])
generate_heatmap(word_freq_matrix, word_labels, "Word Frequency map (Top {} Words)".format(top_n_words))

# Word Co-occurrence Heatmap with top N most relevant words
top_n_words_cooc = 10  # Set the number of top words for co-occurrence
vectorizer = CountVectorizer(binary=True)
co_occurrence_matrix = vectorizer.fit_transform(text_data).T.dot(vectorizer.fit_transform(text_data))
word_cooc_labels = vectorizer.get_feature_names_out()
word_cooc_matrix = co_occurrence_matrix[:top_n_words_cooc, :top_n_words_cooc]
generate_heatmap(word_cooc_matrix.toarray(), word_cooc_labels[:top_n_words_cooc], "Word Co-occurrence map (Top {} Words)".format(top_n_words_cooc))

# Text Similarity Heatmap
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(text_data)
text_similarity_matrix = cosine_similarity(tfidf_matrix, tfidf_matrix)
generate_heatmap(text_similarity_matrix, range(len(text_data)), "Text Similarity map")

# Print document indices and corresponding text data
print("Document Indices and Corresponding Text Data:")
for i, sentence in enumerate(text_data):
    print(f"{i}: {sentence}")

# Sequence Analysis Heatmap (using word presence)
sequence_matrix = np.zeros((len(tokenized_data), len(tokenized_data[0])))
for i, sentence in enumerate(tokenized_data):
    for j, word in enumerate(sentence):
        if word in text_data[i]:
            sequence_matrix[i][j] = 1

# Print legend for Sequence Analysis Heatmap (Word Presence)
print("\nLegend for Sequence Analysis Heatmap (Word Presence):")
print("0: Word is not present in the sentence.")
print("1: Word is present in the sentence.")
print("-----------------------------------")
print("X axis: Total number of words in each text")
print("Y axis: Total number of text")

generate_heatmap(sequence_matrix, range(len(tokenized_data[0])), "Sequence Analysis Heatmap (Word Presence)")

********************************************************************************

# **Task 3 : Geospatial Visualization**  

Geospatial Visualization in the context of Natural Language Processing (NLP) refers to the visualization of location-based or geographic data that is derived from textual sources. This can help to present insights and patterns related to locations, movements, or any spatial context mentioned in the text.

Geospatial Visualization with NLP can be used to do the below tasks:

**Information Extraction:** NLP techniques are used to extract location-based information from textual sources. This could be place names, addresses, coordinates, or even contextual information related to places. Named Entity Recognition (NER) is a common NLP task for extracting such named entities, including locations.

**Geocoding:** After extracting place names, these need to be converted to geographic coordinates (latitude and longitude) for visualization. This process is called geocoding and often involves third-party services or datasets. Conversely, reverse geocoding converts coordinates back to human-readable place names.

**Spatial Analysis:** Once you have location-based data, you can perform spatial analysis. This can involve understanding proximity, calculating distances between points, determining clusters, or understanding movement patterns.

**Visualization:** The location-based data can be visualized on maps using markers, heatmaps, lines, or polygons. This helps in deriving insights related to spatial distribution, trends, or patterns.

Application are **Travel Blogs or Diaries** (Extract and visualize places mentioned in a travelogue to trace the journey on a map)
**News Analysis** (Identify and visualize the geographic distribution of news events or stories)
**Social Media Analysis** (Understand where users are talking about specific topics by analyzing geotagged tweets or posts)
**Literature or Historical Analysis** (Map places mentioned in historical documents or novels to get a spatial understanding of events or narratives)

Tools & Libraries that can be used are **geopy** for geocoding and **folium** for map-based visualization are popular used with **python** and tools like **QGIS** or **ArcGIS** can be used for more advanced geospatial analysis and visualization which are **GIS Software**.

In general, Geospatial Visualization in NLP involves extracting location information from textual data and presenting it visually on maps or other spatial formats, enabling richer analysis and insights related to geographic or spatial context.

We have 2 code segments below 3(a) and 3(b) which are used to understand the concept of Geospatial Visualization.


<><><><><><><><><><><><><><><><><><><>>
# **Task 3(a) : Geospatial Visualization without NLP**  

Lets deal with this task as a solution for problem statemnt

**Problem Statement: Geospatial Visualization of Personal Travel Diary**

**Background:**
In the age of digital content and blogging, many individuals document their travel experiences in digital formats such as blogs, online journals, or social media posts. These textual narrations often include dates and names of places visited. While these narratives are rich in detail, they lack an interactive and visual perspective that would allow readers or the authors themselves to visually trace their journey over time.

**Objective:**
Develop a tool that extracts mentions of dates and places from a given textual travel diary and then visualizes these visits on an interactive map. The visual representation should allow users to see the sequence of visits as they happened over time.

**Requirements:**
The tool should accept a multiline text input describing travel experiences with mentions of dates and places.
The tool should be able to recognize and extract date and place mentions from the given input.
For each recognized place, the tool should determine its geographical coordinates.
The tool should visualize the extracted places on an interactive map.
The visual representation on the map should be timestamped, allowing for an animated playback of the travel journey.
Each place on the map should have a distinct visual representation (e.g., different colors) to distinguish between the various visits.
The resultant map should be saved as an interactive HTML file for easy sharing and viewing.

**Constraints:**
The initial version of the tool should focus on places located in the UK.
The tool should respect rate limits when fetching geographical coordinates to avoid overloading geocoding services.
The tool should handle scenarios where a place's geographical coordinates cannot be determined.

**Desired Outcome:**
At the end of this project, a user should be able to provide a text-based travel diary and receive an interactive geospatial visualization in HTML format, illustrating their journey across the UK over time.

The code below provided is the solution for the above stated application, it creates an interactive geospatial visualization to depict the sequence of places visited by a person in the UK, as mentioned in the input_text. The resulting map animates through the sequence of visits with different colored points for each location.


Code description is provided below.

**Libraries Used:**

**folium:** A Python library for creating interactive maps.

**TimestampedGeoJson:** A plugin for folium to visualize data with timestamps.

**Nominatim:** A geocoding service from the geopy library that converts place names to latitude and longitude.

**datetime and time:** Python built-in libraries for date and time operations.

**re:** Python built-in library for regular expression operations.

**input_text:** A multi-line string containing dates followed by place names.

The **re.findall()** function uses a regular expression to extract dates, verbs (e.g., "visited", "went"), and place names from **input_text**.

For converting Place Names to Latitude and Longitude, the code initializes the **Nominatim** geolocator. For each extracted place name, the geolocator gets the latitude and longitude. A **time.sleep(1)** is used to prevent hitting the geolocator's rate limits.

A base map centered around the UK is created using **folium.Map()**.

For timestamped data Preparation, the code prepares data to be visualized as points with timestamps. Each location and its associated date are matched with a color from the predefined list of colors.
The **data** dictionary, formatted as GeoJSON, contains each place as a feature with a point geometry (latitude and longitude) and properties (timestamp and visualization style).

For Adding Timestamped Data to the Map, we use the **TimestampedGeoJson** plugin is used to visualize the data points with timestamps on the base map. The period is set to "P1D" which means 1 day, so the data points will be visualized one day at a time. Various other parameters control the appearance and behaviour of the timeline slider.

Finally, the interactive map with the visualized timeline is saved to an HTML file named **"Visited_Places_Geospatial_Visualization_v2.html"**.


In [None]:
#********************************
# Developer: Eaby Kollonoor Babu
# Version: 2.2
# Added: 2023-08-09
# Contact:eaby.asha@gmail.com
#********************************


# 3(a) Geospatial Visualization without NLP

import folium
from folium.plugins import TimestampedGeoJson
from geopy.geocoders import Nominatim
from datetime import datetime
import time
import re

# Sample text input with 10 places in the UK
input_text = """
On 2023-01-01, I visited London. On 2023-01-02, I went to Edinburgh.
On 2023-01-03, I traveled to Cardiff. On 2023-01-04, I enjoyed Belfast.
On 2023-01-05, I explored Liverpool. On 2023-01-06, I saw Manchester.
On 2023-01-07, I was in Birmingham. On 2023-01-08, I roamed around Bristol.
On 2023-01-09, I stopped by Cambridge. On 2023-01-10, I checked out Oxford.
"""

# Extract places and dates from the text input
places_dates = re.findall(r"On (\d{4}-\d{2}-\d{2}), I.*? (visited|went|traveled|enjoyed|explored|saw|was|roamed|stopped by|checked out) (.*?)\.", input_text)

# Convert place names to latitude and longitude
geolocator = Nominatim(user_agent="timelineGeocoder_v4")
locations = []

for date_str, verb, place in places_dates:
    date = datetime.strptime(date_str, "%Y-%m-%d")
    location = geolocator.geocode(place + ", United Kingdom")
    if location:
        locations.append((date, (location.latitude, location.longitude)))
    time.sleep(1)

# Create a base map
m = folium.Map(location=[54, -3], zoom_start=6)

# List of colors
colors = ["red", "blue", "green", "purple", "orange", "darkred", "darkblue", "darkgreen", "cadetblue", "darkpurple", "pink"]

# Create data for TimestampedGeoJson
data = {
    "type": "FeatureCollection",
    "features": []
}

for (date, coords), color in zip(locations, colors):
    feature = {
        "type": "Feature",
        "geometry": {
            "type": "Point",
            "coordinates": [coords[1], coords[0]]
        },
        "properties": {
            "times": [date.strftime("%Y-%m-%dT%H:%M:%S")],
            "icon": "circle",
            "iconstyle": {
                "fillColor": color,
                "fillOpacity": 0.8,
                "stroke": "true",
                "radius": 7
            }
        }
    }
    data["features"].append(feature)

# Add TimestampedGeoJson to the map
TimestampedGeoJson(
    data,
    period="P1D",
    add_last_point=True,
    auto_play=True,
    loop=False,
    max_speed=1,
    loop_button=True,
    date_options='YYYY-MM-DD',
    time_slider_drag_update=True
).add_to(m)

m.save("Visited_Places_Geospatial_Visualization_v2.html")


<><><><><><><><><><><><><><><><><><><>>
# **Task 3(b) : Geospatial Visualization NLP**  

**Problem Statement:**
Develop an application that takes in textual descriptions of a traveler's journey, with dates, places visited, and sentiments about each location.

**The application should:**
Extract the essential data, i.e., dates, locations, and sentiments from the textual descriptions.
Analyze the sentiments associated with each visited location using natural language processing techniques.
Geocode the locations to get their corresponding geographical coordinates.
Visually represent the journey on a map, indicating the sequence of visits with directional lines and using color-coded markers based on the analyzed sentiments.


**Objectives:**
Data Extraction: Extract dates, locations, and sentiments from raw text using regular expressions.

**Sentiment Analysis:** Employ natural language processing tools to categorize sentiments as positive, negative, or neutral based on textual descriptions.

**Geocoding:** Convert textual location descriptors into geographical coordinates.

**Visualization:** Create an interactive map:
Place markers on the map for each location.
Color-code markers based on sentiment: green for positive, yellow for neutral, and red for negative.
Draw interconnecting lines between markers to represent the journey's sequence, with arrows to indicate direction.

**Deliverables:**
An interactive HTML map showcasing the journey and associated sentiments.
A user-friendly interface for inputting travel journal descriptions.

**Challenges:**
Handling variations in textual descriptions, e.g., different phrases used to denote visiting a place.
Ensuring accurate sentiment analysis, given the subjective nature of sentiments.
Addressing potential geocoding inaccuracies or failures.
Ensuring that the visualization remains clear and interpretable even if there are many locations or overlapping routes.

**Benefits:**
By implementing this application, users can get a comprehensive visual overview of their journey across different places, enriched by their feelings and experiences. It can serve as an enhanced digital travel diary, aiding in reminiscing or sharing travel experiences with others.

**Code Description:**

**folium:** This library is used for rendering leaflet maps.

**datetime and time:** To work with date and time.

**re:** For regular expression operations.

**nltk.sentiment:** Contains the sentiment analysis tool **'SentimentIntensityAnalyzer'**.

**nltk:** Library for natural language processing.

**geopy.geocoders:** To geocode locations into latitude and longitude.

**folium.plugins:** Plugins to beautify and enhance folium visualizations.

For Sentiment Analysis, we use NLTK's **'SentimentIntensityAnalyzer'**, it analyzes the sentiment of the description for each place visited and based on the compound score, the sentiment is classified as positive, negative, or neutral.

The we do Data Extraction, a regular expression pattern is used to extract dates, places, and descriptions from the **'input_text'**. For each extracted place, the **'Nominatim'** geocoder is used to get the latitude and longitude. These are used later for plotting on the map.

For Map creation, a base folium map centred around the UK is created.
For each location (sorted by date), a marker is placed on the map. The number on the marker indicates the order of visits, and its color reflects the sentiment (green for positive, yellow for neutral, and red for negative).
Lines are drawn to interconnect the places in the order they were visited. These lines have arrows using **'PolyLineTextPath@** to show the direction of the journey.

Last the final map visualization is saved as an HTML file named **"Enhanced_Sentiment_Geospatial_Visualizations_v3.html"**.


In [None]:
#********************************
# Developer: Eaby Kollonoor Babu
# Version: 2.2
# Added: 2023-08-19
# Contact:eaby.asha@gmail.com
#********************************

# Task 3(b) : Geospatial Visualization NLP

import folium
from datetime import datetime
import time
import re
from folium.plugins import PolyLineTextPath, BeautifyIcon
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk
from geopy.geocoders import Nominatim

# Sample text input with sentiments for each place
input_text = """
On 2023-01-01, I visited London, United Kingdom, and it was absolutely breathtaking.
On 2023-01-02, I went to Edinburgh, Scotland, and I found it very boring.
On 2023-01-03, I traveled to Cardiff, Wales, which was quite delightful.
On 2023-01-04, I enjoyed Belfast, Northern Ireland, although it was a bit underwhelming.
On 2023-01-05, I explored Liverpool, England, and it was an amazing experience.
On 2023-01-06, I saw Manchester, England, which was okay.
On 2023-01-07, I was in Birmingham, England, and didn't like it much.
On 2023-01-08, I roamed around Bristol, England, which was exciting.
On 2023-01-09, I stopped by Cambridge, England, which was decent.
On 2023-01-10, I checked out Oxford, England, and it was fabulous.
"""

# Initialize sentiment analysis tool
nltk.download('vader_lexicon', quiet=True)
sia = SentimentIntensityAnalyzer()

# Extract sentiments, places, and dates from the text input
pattern = r"On (\d{4}-\d{2}-\d{2}), I.*?(?:visited|went to|traveled to|enjoyed|explored|saw|was in|roamed around|stopped by|checked out) (.*?)(?: and |, |\.)(.*?)(?:\.|$)"
places_dates_descriptions = re.findall(pattern, input_text)

# Set up the Nominatim geocoder
geolocator = Nominatim(user_agent="timelineGeocoder_v10", timeout=10)

locations = []
for date_str, place, description in places_dates_descriptions:
    date = datetime.strptime(date_str, "%Y-%m-%d")

    sentiment = "neutral"
    sentiment_score = sia.polarity_scores(description)["compound"]
    if sentiment_score > 0.05:
        sentiment = "positive"
    elif sentiment_score < -0.05:
        sentiment = "negative"

    try:
        location = geolocator.geocode(place + ", United Kingdom")
        if location:
            locations.append((date, place.strip(), (location.latitude, location.longitude), sentiment))
    except Exception as e:
        print(f"Error geocoding {place}: {str(e)}")

    time.sleep(1)  # Adjust the waiting time if needed to be polite to the service

# Create a base map
m = folium.Map(location=[54, -3], zoom_start=6, tiles="CartoDB Positron")

# Color mapping based on sentiment
color_map = {
    "positive": "#00FF00",  # Green
    "neutral": "#FFFF00",  # Yellow
    "negative": "#FF0000"  # Red
}

coords_list = []
for idx, (date, place, coords, sentiment) in enumerate(sorted(locations, key=lambda x: x[0])):
    folium.Marker(
        location=coords,
        icon=BeautifyIcon(
            icon_shape='marker',
            border_color=color_map[sentiment],
            border_width=1,
            text_color=color_map[sentiment],
            number=idx+1,
            inner_icon_style='margin-top:0px;'
        ),
        popup=f"{date.strftime('%Y-%m-%d')} - {place} ({sentiment})"
    ).add_to(m)

    coords_list.append(coords)

# Draw interconnecting lines
polyline = folium.PolyLine(coords_list, color="#00ABDC", weight=4, opacity=0.7).add_to(m)
PolyLineTextPath(polyline, '→', offset=10, repeat=True, font_size=18, font_weight='bold').add_to(m)

m.save("Enhanced_Sentiment_Geospatial_Visualizations_v3.html")