### Prompt Injections Dataset
- **Description:** Data related to prompt injection attacks and defenses.
- **Data Source:** [prompt_injection_w_umap_embeddings.tsv](https://www.dropbox.com/scl/fi/88lky7ogiugfkngzo8blq/prompt_injection_w_umap_embeddings.tsv?rlkey=6f1tfws5oswvzska29l1l4l2i&dl=1)
  - **Potential columns for visualization:**
    - **X & Y Coordinates:** `x`, `y`
    - **Point Size:** `size`
    - **Color:** `label`
    - **Label:** `text`
  - **Related code file:** [prompt_injections.py](https://github.com/thorwhalen/imbed_data_prep/blob/main/imbed_data_prep/prompt_injections.py)

## Get data

### Data parameters

In [1]:
ext = '.tsv'
src = 'https://www.dropbox.com/scl/fi/88lky7ogiugfkngzo8blq/prompt_injection_w_umap_embeddings.tsv?rlkey=6f1tfws5oswvzska29l1l4l2i&dl=1'
target_filename = 'prompt_injection_w_umap_embeddings.tsv'

### Install and import

In [2]:
import os
if not os.getenv('IN_COSMO_DEV_ENV'):
    %pip install -q cosmograph tabled cosmodata

import tabled
import cosmodata

from functools import partial 
from cosmograph import cosmo

### Load data

In [3]:
if ext:
    getter = partial(tabled.get_table, ext=ext)
else:
    getter = tabled.get_table
# acquire_data takes care of caching locally too, so next time access will be faster
# (If you want a fresh copy, you can delete the local cache file manually.)
data = cosmodata.acquire_data(src, target_filename, getter=getter)

Fetching data from https://www.dropbox.com/scl/fi/88lky7ogiugfkngzo8blq/prompt_injection_w_umap_embeddings.tsv?rlkey=6f1tfws5oswvzska29l1l4l2i&dl=1...
Data cached at: /Users/thorwhalen/.local/share/cosmodata/datasets/prompt_injection_w_umap_embeddings.tsv.pkl


## Peep at the data

In [4]:
mode = 'short'  #Literal['short', 'sample', 'stats'] = 'short',
exclude_cols = []
cosmodata.print_dataframe_info(data, exclude_cols, mode=mode)

DataFrame shape: (662, 6)
First row
------------------------------------------------------------
text     Refugee crisis in Europe solutions
label                                     0
x                                  8.972539
y                                   1.02286
id                                        0
size                                     34


## Visualize data

### Scatter Plot of Points

This visualization represents a scatter plot where each point corresponds to a row in the provided DataFrame. The X and Y coordinates are mapped to the columns `x` and `y`, respectively, to represent their position in a 2D space. Each point's size is determined by the `size` column, and colors are assigned based on the `label` column, allowing for easy identification of patterns or groups among the points.

In [16]:
cosmo(
    data,
    point_x_by="x",
    point_y_by="y",
    point_size_by="size",
    point_color_by="label",
    point_id_by="id",
    point_label_by="text",
    point_color_palette=["#ff0000", "#00ff00", "#0000ff"],
    point_size_range=[5, 25],
    point_greyout_opacity=0.1,
    # point_size_scale=2,
    show_labels=True,
    show_hovered_point_label=True,
    background_color="#222222",
    fit_view_on_init=True,
)

Cosmograph(background_color='#222222', components_display_state_mode=None, fit_view_on_init=True, focused_poin…