# Tag message data
Script that will tag message data.

## Expected message input:

| Property | Data Type | Description |
| :------- | :-------- | :---------- |
| objectId | string | Id of the tweet, post or comment |
| message  | string | Message data to be analysised |
| objectType | string | The type of data enum(tweet, post, comment) |

## Expected tagging output:
| Property | Data Type | Description |
| :------- | :-------- | :---------- |
| objectId | string | Id of the tweet, post or comment |
| message  | string | Message data to be analysised |
| objectType | string | The type of data enum(tweet, post, comment) |
| cleanedMessage | string | Message that has been cleaned |
| language | string | enum("ar", "en", "ar_izi") |
| arabicMessage | string | Message but translated to arabic |
| features | array\<string\> | Features based on the clean messages |
| sentiment | DICTIONARY | enum("positive", "negative", "neutral") |
| topics | array\<string\> |This will be an ordered list of the topics that are needed |
| topic1 | string | Topic 1 |
| topic2 | string | Topic 2 |
| topic3 | string | Topic 3 |
| hasKeyFeatue | bool | If the post has a key feature |

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import datetime

from phoenix.common import artifacts
from phoenix.common import utils
from phoenix.tag import tag

In [None]:
utils.setup_notebook_output()
utils.setup_notebook_logging()

In [None]:
# Parametrise the run execution date.
# Format of the run date
RUN_DATE_FORMAT = "%Y-%m-%d"
# This can be overwritten at execution time by Papermill to enable historic runs and backfills etc.
RUN_DATE = datetime.datetime.today().strftime(RUN_DATE_FORMAT)

# Input and output
INPUT_MESSAGE_URL = f"{artifacts.urls.get_local()}{RUN_DATE}/input_message_data.parquet"
OUTPUT_TAGGED_URL = f"{artifacts.urls.get_local()}{RUN_DATE}/output_tagged_data.parquet"

In [None]:
# Display params.
print(
INPUT_MESSAGE_URL,
OUTPUT_TAGGED_URL,
RUN_DATE,
sep='\n',
)

In [None]:
utils.dask_global_init()

In [None]:
message_df = artifacts.dataframes.get(INPUT_MESSAGE_URL).dataframe

In [None]:
tagged_df = tag.tag_dataframe(message_df)

In [None]:
tagged_df.head()

In [None]:
artifacts.dataframes.persist(artifacts.dataframes.url(OUTPUT_TAGGED_URL, tagged_df)