# Creating text representations

The goal of this notebook is to convert data about a market to a text representation we can use for embedding.

* Author: jgyou <jyoung22@uvm.edu>
* Date: 12/29/2023
* License: MIT


In [143]:
import manifoldpy
from datetime import datetime
from collections.abc import Iterable

Some utility functions for handling weird data types:

In [88]:
def _text_date(timestamp):
    """Turn timestamp into a text date in the format of mm-dd-yyyy."""
    return datetime.utcfromtimestamp(timestamp/1000).strftime('%m-%d-%Y')

In [144]:
def _textify(comment):
    """Turn a comment into a text representation."""
    return [y['text'] for x in comment.content['content'] 
            if (x['type'] == 'paragraph' and  isinstance(x.get('content'), Iterable)) 
            for y in x.get('content') if y.get('type') == 'text']


In [145]:
def comment_text_representation(comment):
    """Turn a comment into a full text representation including dates and likes."""
    representation = "[Date]: " + _text_date(comment.createdTime) + "\n" \
        "[Likes] " + str(comment.likes) + "\n" \
        "[Text]: " + " ".join(_textify(comment)) + "\n"
    return representation

In [147]:
def market_text_representation(marketSlug):
    """Construct text representation of a market from its slug.
    
    Parameters
    ----------
    marketSlug : str
        The slug of the market to be represented.
    
    Returns
    -------
    str
        The text representation of the market.
    """
    # get market from slug
    market = manifoldpy.api.get_slug(marketSlug)
    # get comments by market creator
    comment_data = manifoldpy.api.get_comments(marketSlug=marketSlug)
    comments = [c for c in comment_data if c.userId == market.creatorId]
    representation = "[Market title] " + market.question + "\n" + \
        '[Market description] ' + market.textDescription + "\n" + \
        '[Market creator] ' + market.creatorName + "\n" + \
        '[Creation date] ' + _text_date(market.createdTime) + "\n" + \
        '[Closing data] ' + _text_date(market.closeTime) + "\n" + \
        "------------------------------------------------------\n" + \
        "Comments by market creator\n" + \
        "------------------------------------------------------\n" + \
        "---\n".join([comment_text_representation(c) for c in comments])
    return representation

In [150]:
exampleSlug = 'will-builders-remedy-take-effect-fo'

representation = market_text_representation(exampleSlug)
print(representation)

[Market title] Will Builders remedy take effect for San Francisco in 2023?
[Market description] if San Francisco fails to satisfy its RHNA requirements and the state puts builders remedy into effect for the city this will resolve yes. 
[Market creator] Kevin Kwok
[Creation date] 10-28-2023
[Closing data] 01-01-2024
------------------------------------------------------
Comments by market creator
------------------------------------------------------
[Date]: 11-28-2023
[Likes] None
[Text]: Looks like HCD gave another 30 days extension. Between that and some lack of clarity on whether builders remedy would be the immediate consequence, chances of builders remedy by EOY looking lower https://sfstandard.com/2023/11/27/san-francisco-blows-state-housing-deadline-constraints-reduction-ordinance/
---
[Date]: 10-31-2023
[Likes] None
[Text]: https://twitter.com/_fruchtose/status/1719120862559997991?s=12&t=lMwdlElrn7H4f1W6FIDZ9w



TODO: 
* Explore different representations
* Add access via IDs.
* Apply at scale