# Using LLM to Analyze Sense of Fictional Place and Make Storymaps

### Author: Bo Zhao，Rin Huang
### Date: June 1, 2025

This Colab notebook demonstrates how to use a large language model (LLM), specifically Gemini Flash 2.0 or Gemini, to perform spatial narrative extraction and sense of place analysis from a geographic text. Focused on Calvino's _Invisible Cities_, the workflow includes:

- Extracting landmark data from a PDF using LLM prompts

- Structuring that data into a CSV with geolocation and narrative attributes

- Converting the CSV into a GeoJSON file

- Generating a scrollable, HTML-based story map powered by MapLibre GL JS

Each location is enriched with thematic information and rendered with colored points on a fictinal map. *The html file requires modification and is not the eventual demo html.*  

In [2]:
# --- 1. Install dependencies ---
!pip install google-generativeai



In [3]:
# --- 2. Mount Google Drive ---
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
# --- 3. Set Gemini API Key ---
import os
import google.generativeai as genai
import json
import pandas as pd

In [5]:
genai.configure(api_key='')  # please input your API Key. You can apply for a Gemini Key from Google.

In [31]:
# --- 4. Upload PDF binary content ---
pdf_path = "/content/drive/MyDrive/Calvino_Italo_Invisible_Cities.pdf"
with open(pdf_path, "rb") as f:
    pdf_content = f.read()

In [32]:
# --- 5. Prompt for extracting structured spatial data ---
prompt_text = '''
Read the provided PDF file of Italo Calvino's "Invisible Cities". Your objective is to conduct a literary analysis by deconstructing each described city/conceptual place into key analytical components. **It is absolutely crucial that your output consists of concise analytical labels, keywords, or very brief summaries formulated in your own synthesis. To prevent triggering content filters, DO NOT reproduce verbatim passages or long descriptive sentences from the source text. Focus on extracting and formulating metadata and analytical insights.**

For each distinct city/place, return the result as a comma-separated table (CSV format) with the following columns:

city_id,city_name,city_s_defining_characteristics_keywords,key_symbolic_features_list,latitude,longitude,dominant_theme_keywords,theme_explanation_summary,prevailing_mood_tags,observed_rhetorical_strategies

Explanation of columns (generate concise, analytical data, not prose):
- city_id: A unique sequential identifier (e.g., 1, 2, 3...).
- city_name: The proper name of the city (e.g., Zaira, Armilla).
- city_s_defining_characteristics_keywords: **List 3-5 keywords or very short phrases (e.g., "high bastions_accumulated past_static") that capture the city's core structural or conceptual premise. Use underscores `_` instead of commas within this field's content.**
- key_symbolic_features_list: **List up to 5-7 key symbolic elements as keywords or short noun phrases (e.g., "silver domes_bronze statues_crystal theater_golden cock"). Use underscores `_` to separate items if listing multiple keywords where commas would naturally occur.**
- latitude: Output 'N/A'.
- longitude: Output 'N/A'.
- dominant_theme_keywords: **Identify 1-3 keywords for the primary abstract theme (e.g., "Memory_History_Burden").**
- theme_explanation_summary: **In ONE very concise sentence (max 15-20 words), explain the connection between the city's characteristics and its dominant theme, using analytical language. Avoid any direct quotes.**
- prevailing_mood_tags: **Provide 1-3 descriptive adjectives or short tags for the mood (e.g., "Nostalgic_Melancholic_Static").**
- observed_rhetorical_strategies: **List 1-3 key rhetorical devices or narrative techniques observed (e.g., "Metaphor_Detailed Inventory_Personification"). Use underscores `_` for internal separation if needed.**

**IMPORTANT: Strict CSV Formatting Rules to Follow:**
1.  Each field (column value) in a row must be separated from the next by a single comma (`,`). This comma is the **main delimiter between columns**.
2.  **Internal Comma Replacement:** Within the *content* of any given field, if the original analytical keywords/phrases would naturally use a comma (`,`) for separation (e.g., a list of keywords), you **MUST** replace that internal comma with an underscore character (`_`).
3.  **Text Field Quoting:** After performing internal comma replacement, all fields containing text (this includes `city_name` and all other descriptive/analytical columns) **MUST** be enclosed in a pair of double quotes (`"`). `latitude`, `longitude` (when "N/A"), and `city_id` (when a plain number) do not need these enclosing quotes.
4.  **Escaping Internal Double Quotes:** If a field's content (after internal comma replacement) itself contains a double quote character (`"`), that internal double quote **MUST** be escaped by replacing it with two double quotes (`""`). The entire field must then also be enclosed in its own pair of double quotes. For example: `"Keywords_like_""Quoted Term""_etc"`
5.  Ensure every data row has exactly 10 fields.
6.  Pay very close attention to these formatting rules.

Example (output should be very concise and keyword-driven):
city_id,city_name,city_s_defining_characteristics_keywords,key_symbolic_features_list,latitude,longitude,dominant_theme_keywords,theme_explanation_summary,prevailing_mood_tags,observed_rhetorical_strategies
1,Zaira,"High bastions_accumulated past_static_relational space","Memories as objects_city as hand lines_past containment_no innovation_usurper's gunboat story",N/A,N/A,"Memory_History_Burden","City's form is dictated by its unescapable past_ hindering present experience or change.","Nostalgic_Melancholic_Static","Metaphor_Detailed Inventory_Personification"
2,Anastasia,"Desire awakening_simultaneous fulfillment_beautiful trap_concentric canals","Precious stones_kites_perfumes_banquets_enslaving desires_agate_onyx_chrysoprase",N/A,N/A,"Desire_Enslavement_Illusion","Apparent total desire satisfaction paradoxically results in the inhabitants' complete enslavement to the city.","Deceptive_Enticing_Paradoxical","Sensory Detail_Juxtaposition_Irony"

Do not include any notes or explanation outside of the CSV formatted text. Ensure all requested columns are present. Output only the CSV content, starting with the header line.
'''

In [33]:
# model = genai.GenerativeModel("gemini-2.0-flash") #Gemini Flash
model = genai.GenerativeModel("models/gemini-2.5-pro-preview-05-06") #Gemini Pro
response = model.generate_content(
    [
        {"text": prompt_text},
        {"mime_type": "application/pdf", "data": pdf_content},
    ]
)

In [35]:
# --- 6. Parse CSV and save as file ---
csv_text = response.text.strip()

# Save CSV to file
csv_path = "Invisible_Cities.csv"
with open(csv_path, "w", encoding="utf-8") as f:
    f.write(csv_text)

print("\n✅ CSV file saved as 'Invisible_Cities.csv'\nDownload it below:")
from google.colab import files
files.download(csv_path)


✅ CSV file saved as 'Invisible_Cities.csv'
Download it below:


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
import pandas as pd
import json
import random
import io
from google.colab import files

# --- 7.1 Upload the CSV File ---
print("Please upload your CSV file (e.g., 'Invisible_Cities.csv')...")
try:
    uploaded = files.upload()
    csv_filename = next(iter(uploaded))
    csv_content = uploaded[csv_filename]
    print(f"\n✅ SUCCESS: File '{csv_filename}' upload successful！")
except (StopIteration, NameError):
    print("\n❌ ERROR: No file being uploaded. Please try again.")
    raise SystemExit("No file uploaded.")

# --- 7.2 Create the theme tag ---
geojson_output_path = "Invisible_Cities_Conceptual.geojson"
theme_centers = {
    "Duality":          {"lat": 44.47, "lon": -121.21}, "Perception":       {"lat": 43.18, "lon": -123.33},
    "Memory":           {"lat": 45.98, "lon": -122.95}, "Desire":           {"lat": 42.17, "lon": -119.58},
    "Order":            {"lat": 46.13, "lon": -119.23}, "Happiness":        {"lat": 47.78, "lon": -124.36},
    "Time":             {"lat": 41.01, "lon": -122.56}, "Boundary":         {"lat": 48.0, "lon": -121.0},
    "Change":           {"lat": 42.84, "lon": -117.8}, "Conflict":         {"lat": 49.33, "lon": -122.38},
    "Construction":     {"lat": 40.42, "lon": -124.96}, "Consumerism":      {"lat": 48.33, "lon": -118.06},
    "Domesticity":      {"lat": 50.56, "lon": -123.4}, "Hiddenness":       {"lat": 39.42, "lon": -120.67},
    "History":          {"lat": 50.11, "lon": -119.53}, "Imagination":      {"lat": 52.0, "lon": -124.73},
    "Incompleteness":   {"lat": 38.4, "lon": -123.12}, "Justice":          {"lat": 50.93, "lon": -117.27},
    "Language":         {"lat": 53.48, "lon": -122.15}, "Life":             {"lat": 37.78, "lon": -119.64},
    "Perspective":      {"lat": 53.11, "lon": -118.73}, "Possibility":      {"lat": 55.08, "lon": -124.01},
    "Precarity":        {"lat": 36.85, "lon": -122.01}, "Relationships":    {"lat": 54.4, "lon": -116.14},
    "Roles":            {"lat": 56.88, "lon": -120.94}, "Sameness":         {"lat": 35.83, "lon": -119.14},
    "Semiotics":        {"lat": 44.5, "lon": -124.88}, "Choice":           {"lat": 35.1, "lon": -124.58},
    "Default":          {"lat": 34.0, "lon": -128.0}
}
spread = 0.5

# --- 7.3 Data processing ---
print("\nINFO: Data processing...")
df = pd.read_csv(io.BytesIO(csv_content))
features = []
if not df.empty:
    for index, row in df.iterrows():
        # Assign the theme with location
        dominant_keywords = row.get('dominant_theme_keywords', 'Default')
        primary_theme = dominant_keywords.split('_')[0]
        center = theme_centers.get(primary_theme, theme_centers["Default"])
        new_lon = center["lon"] + random.uniform(-spread, spread)
        new_lat = center["lat"] + random.uniform(-spread, spread)

        # --- Turn pandas NaN into JSON-compatible null ---
        raw_properties = row.to_dict()
        properties = {}
        for key, value in raw_properties.items():
            if pd.isna(value):
                properties[key] = None
            else:
                properties[key] = value

        properties['primary_theme'] = primary_theme

        feature = {
            "type": "Feature",
            "geometry": {"type": "Point", "coordinates": [new_lon, new_lat]},
            "properties": properties
        }
        features.append(feature)
    print(f"✅ SUCCESS: {len(features)} lists finish processing.")
else:
    print("⚠️ WARNING: CSV is empty. ")


# --- 7.4 Save the GeoJSON file ---
if features:
    geojson_data = {"type": "FeatureCollection", "features": features}
    try:
        with open(geojson_output_path, "w", encoding="utf-8") as f:
            json.dump(geojson_data, f, ensure_ascii=False, indent=2)
        print(f"\nINFO: GeoJSON is produced in Colab as '{geojson_output_path}'。")
        print("INFO: Preparing to download... ")
        files.download(geojson_output_path)
        print("✅ SUCCESS: File download triggered.")
    except Exception as e:
        print(f"❌ ERROR: File download failure. Details: {e}")
else:
    print("⚠️ WARNING: No data being processed. No GeoJSON file being processed. ")

Please upload your CSV file (e.g., 'Invisible_Cities.csv')...


In [30]:
# --- 8. Let Gemini generate MapLibre StoryMap HTML ---

import google.generativeai as genai
import json # Though not directly used in this specific block, often useful.
from google.colab import files
import traceback


try:
    # model = genai.GenerativeModel("models/gemini-1.5-pro-latest") # Using a generally available Pro model
    model = genai.GenerativeModel("models/gemini-2.5-pro-preview-05-06") # Your specified experimental model
    print(f"Using model: {model.model_name}")
except Exception as e:
    print(f"Error initializing model: {e}")
    print("Please ensure you have configured your API key and the model name is correct.")
    # Stop execution if model can't be initialized
    raise SystemExit("Model initialization failed.")

# --- 8.1 Define the name of your GeoJSON file for "Invisible Cities" ---
# This MUST match the filename of the GeoJSON you generated in the previous step.
your_geojson_filename = "Invisible_Cities_Conceptual.geojson" # <--- VERIFY THIS FILENAME

# --- 8.2 Define the Prompt for generating the MapLibre StoryMap HTML ---
# This prompt instructs Gemini on how to create the index.html file
# It's tailored for your conceptual "Invisible Cities" data.

prompt_html_invisible_cities = f'''
Create a complete, single HTML file called index.html that uses the latest version of MapLibre GL JS (from a CDN like https://unpkg.com/maplibre-gl@latest/dist/maplibre-gl.js and corresponding CSS https://unpkg.com/maplibre-gl@latest/dist/maplibre-gl.css) to display a story map based on conceptual cities from a GeoJSON file.

The GeoJSON data source will be a local file named '{your_geojson_filename}', located in the same folder as this index.html file.
Each feature in the GeoJSON file represents a conceptual city and should be shown as a circular marker on the map.
The map should not assume real-world geography for these points; they represent an abstract spatial arrangement.

Use the following example GeoJSON feature (from '{your_geojson_filename}') to understand the data structure and properties to access:
{{
  "type": "Feature",
  "geometry": {{
    "type": "Point",
    "coordinates": [0.1, -0.2] // Example of one of the conceptual (arbitrary) coordinates
  }},
  "properties": {{
    "city_id": 3,
    "city_name": "Dorothea",
    "city_s_defining_characteristics_keywords": "Two descriptive modes_quantitative vs narrative_limitations of perspective",
    "key_symbolic_features_list": "Aluminum towers_drawbridge gates_green canals_nine quarters_bergamot_sturgeon roe_astrolabes_hurrying people_fine-toothed women_trumpeting soldiers_colored banners",
    "dominant_theme_keywords": "Knowledge_Perspective_Subjectivity",
    "theme_explanation_summary": "Dorothea's essence is explored via contrasting narratives_revealing how objective and subjective views limit full understanding.",
    "prevailing_mood_tags": "Nostalgic_Contemplative_Dualistic", // Corrected from your previous example for Dorothea
    "observed_rhetorical_strategies": "Contrasting Modes_Juxtaposition_Narrative Framing"
  }}
}}

The HTML page layout should be:
- A full-screen view.
- A fixed MapLibre map occupying approximately 60-70% of the viewport width on the right side.
- A scrollable sidebar on the left (approx. 30-40% width) containing sections, one for each conceptual city.

Each section in the sidebar should clearly display the following properties for the corresponding city:
- `city_name` (as a prominent heading, e.g., <h3>).
- `city_s_defining_characteristics_keywords` (labeled appropriately).
- `key_symbolic_features_list` (labeled appropriately).
- `dominant_theme_keywords` (labeled appropriately).
- `theme_explanation_summary` (labeled appropriately).
- `prevailing_mood_tags` (labeled appropriately).
- `observed_rhetorical_strategies` (labeled appropriately).
- Instead of images from Unsplash, display a simple placeholder text like "Conceptual representation for [city_name]" or a decorative horizontal rule for each city's image area.

Map Interaction:
- When a city's section in the sidebar is clicked (or scrolled into prominent view, if easily implemented), the map should smoothly "fly to" and center on that city's marker.
- Clicking a marker on the map should highlight or scroll to the corresponding section in the sidebar.

Basemap and View:
- Use a very simple or abstract basemap. Since the coordinates are conceptual (e.g., small numbers like [0.1, -0.2], [0.2, -0.2], etc., typically within a small range like 0-5 units for x and y), a complex geographical map is not suitable.
- Consider a plain colored background for the map (e.g., light gray #f0f0f0) or a minimal style that does not require an API key.
- The initial map view (center and zoom) should be configured to appropriately display all the conceptual points. For example, if points are in a grid from roughly (0,0) up to (a few units, a few units), center the map there and set an appropriate zoom level. Ensure the map is pannable and zoomable by the user.

Technical details:
- Ensure all JavaScript and CSS are embedded within the single HTML file or linked from CDNs. No external local JS/CSS files.
- Use modern HTML5 and CSS3 best practices for layout and styling. Make the sidebar scrollable if its content exceeds the viewport height.
- The map should initialize properly when the HTML page is loaded.

Return ONLY the complete HTML code for the `index.html` file. Do not include any other explanatory text or markdown formatting around the HTML code.
'''

# --- 8.3 API Call to Generate HTML and Save to File ---
html_output_filename = "index_invisible_cities.html"

print(f"⏳ Generating '{html_output_filename}' using the Gemini API...")
print("   This may take a minute or more depending on the model's response time...")
print(f"   Using GeoJSON source: '{your_geojson_filename}' in the prompt.")

try:
    # Sending the request to the Gemini API
    response_html = model.generate_content(
        prompt_html_invisible_cities,
        request_options={'timeout': 600.0} # 600 seconds = 10 minute timeout
    )

    # Saving the generated HTML content to a file
    # It's good practice to check if the response has text before writing
    if response_html.parts:
        generated_text = "".join(part.text for part in response_html.parts if hasattr(part, 'text'))
        with open(html_output_filename, "w", encoding="utf-8") as f:
            f.write(generated_text)

        print(f"\n✅ '{html_output_filename}' has been successfully generated and saved.")

        # Providing a download link for the generated HTML file in Colab
        print("   Download it below:")
        files.download(html_output_filename)
    else:
        print(f"\n❌ Error: Gemini API response did not contain any text parts to save for '{html_output_filename}'.")
        if response_html.prompt_feedback:
            print(f"   Prompt Feedback: {response_html.prompt_feedback}")
        # You might want to print the full response to debug if parts are missing
        # print(f"   Full API Response: {response_html}")


except Exception as e:
    print(f"\n❌ An error occurred during HTML generation or file saving.")
    print(f"   Error details: {e}")
    traceback.print_exc()
    print("\n   Troubleshooting tip: Check if the error message mentions 'finish_reason'.")
    print("   If it does, it might be a content safety filter (like recitation), a token limit, or another API issue.")
    print("   Ensure your API key is configured and the model name is correct and available.")

Using model: models/gemini-2.5-pro-preview-05-06
⏳ Generating 'index_invisible_cities.html' using the Gemini API...
   This may take a minute or more depending on the model's response time...
   Using GeoJSON source: 'Invisible_Cities_Conceptual.geojson' in the prompt.

✅ 'index_invisible_cities.html' has been successfully generated and saved.
   Download it below:


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>