Skip to content
This repository was archived by the owner on Aug 7, 2024. It is now read-only.
This repository was archived by the owner on Aug 7, 2024. It is now read-only.

Tweet serialization & deserialization looses (at least) data about media #484

@M4rtinK

Description

@M4rtinK

I've recently implemented tweet caching in my Twitter app, based on the AsDict() for serialization & NewFromJsonDict() for deserialization.

But I've noticed an inconsistency - if the serialized tweet has media, the media description is correctly serialized to the dict by AsDict(), but when the tweet is deserialized with NewFromJsonDict() the media property of the new Status instance is None.

I've also tried passing the dict to the Status constructor instead of kwargs, but the media is not None, but instead contains a list of dicts describing the media, not the expected list Media instances.

This is a short reproducer demonstrating the issue:

#!/usr/bin/python3

import twitter

as_dict_output = {'id_str': '874353511328169984', 'media': [{'media_url': 'http://pbs.twimg.com/media/DCJTgZ_UIAAwFnw.jpg', 'url': 'https://t.co/JDv3Iz9L54', 'type': 'photo', 'display_url':
 'pic.twitter.com/JDv3Iz9L54', 'expanded_url': 'https://twitter.com/Space_Station/status/874353511328169984/photo/1', 'media_url_https': 'https://pbs.twimg.com/media/DCJTgZ_UIAAwFnw.jpg', '
sizes': {'medium': {'resize': 'fit', 'w': 1200, 'h': 1149}, 'large': {'resize': 'fit', 'w': 2048, 'h': 1960}, 'thumb': {'resize': 'crop', 'w': 150, 'h': 150}, 'small': {'resize': 'fit', 'w'
: 680, 'h': 651}}, 'id': 874353093860663296}, {'media_url': 'http://pbs.twimg.com/media/DCJThr7UwAE9rCo.jpg', 'url': 'https://t.co/JDv3Iz9L54', 'type': 'photo', 'display_url': 'pic.twitter.
com/JDv3Iz9L54', 'expanded_url': 'https://twitter.com/Space_Station/status/874353511328169984/photo/1', 'media_url_https': 'https://pbs.twimg.com/media/DCJThr7UwAE9rCo.jpg', 'sizes': {'medi
um': {'resize': 'fit', 'w': 1200, 'h': 1114}, 'large': {'resize': 'fit', 'w': 2048, 'h': 1901}, 'thumb': {'resize': 'crop', 'w': 150, 'h': 150}, 'small': {'resize': 'fit', 'w': 680, 'h': 63
1}}, 'id': 874353115855634433}, {'media_url': 'http://pbs.twimg.com/media/DCJTi68UQAA1ZPE.jpg', 'url': 'https://t.co/JDv3Iz9L54', 'type': 'photo', 'display_url': 'pic.twitter.com/JDv3Iz9L54
', 'expanded_url': 'https://twitter.com/Space_Station/status/874353511328169984/photo/1', 'media_url_https': 'https://pbs.twimg.com/media/DCJTi68UQAA1ZPE.jpg', 'sizes': {'medium': {'resize'
: 'fit', 'w': 1134, 'h': 1200}, 'small': {'resize': 'fit', 'w': 643, 'h': 680}, 'large': {'resize': 'fit', 'w': 1935, 'h': 2048}, 'thumb': {'resize': 'crop', 'w': 150, 'h': 150}}, 'id': 874
353137066196992}, {'media_url': 'http://pbs.twimg.com/media/DCJTjY3UwAQbErC.jpg', 'url': 'https://t.co/JDv3Iz9L54', 'type': 'photo', 'display_url': 'pic.twitter.com/JDv3Iz9L54', 'expanded_u
rl': 'https://twitter.com/Space_Station/status/874353511328169984/photo/1', 'media_url_https': 'https://pbs.twimg.com/media/DCJTjY3UwAQbErC.jpg', 'sizes': {'medium': {'resize': 'fit', 'w':
1200, 'h': 1149}, 'large': {'resize': 'fit', 'w': 2048, 'h': 1960}, 'thumb': {'resize': 'crop', 'w': 150, 'h': 150}, 'small': {'resize': 'fit', 'w': 680, 'h': 651}}, 'id': 87435314509832192
4}], 'urls': [], 'favorited': True, 'full_text': '19 years ago today on June 12, 1998, Shuttle-Mir program ended when space shuttle Discovery landed with Mir crew member Andy Thomas. https:
//t.co/JDv3Iz9L54', 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'user': {'following': True, 'profile_background_image_url': 'http://pbs.twimg.com/profile
_background_images/517439388741931008/iRbQw1ch.jpeg', 'location': 'Low Earth Orbit', 'utc_offset': -18000, 'profile_background_color': 'C0DEED', 'description': "NASA's page for updates from
 the International Space Station, the world-class lab orbiting Earth 250 miles above. For the latest research, follow @ISS_Research.", 'favourites_count': 5099, 'id': 1451773004, 'profile_i
mage_url': 'http://pbs.twimg.com/profile_images/822552192875892737/zO1pmxzw_normal.jpg', 'profile_sidebar_fill_color': 'DDEEF6', 'screen_name': 'Space_Station', 'time_zone': 'Central Time (
US & Canada)', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/1451773004/1497549511', 'friends_count': 234, 'verified': True, 'profile_text_color': '333333', 'profile_link_col
or': '0084B4', 'url': 'https://t.co/9Gk2H0gekn', 'created_at': 'Thu May 23 15:25:28 +0000 2013', 'name': 'Intl. Space Station', 'listed_count': 7602, 'followers_count': 1635947, 'statuses_c
ount': 6976, 'lang': 'en'}, 'favorite_count': 1260, 'created_at': 'Mon Jun 12 19:51:36 +0000 2017', 'hashtags': [], 'user_mentions': [], 'retweet_count': 415, 'retweeted': True, 'lang': 'en
', 'id': 874353511328169984}

print("tweet dict from AsDict() has media:")
print(as_dict_output["media"])



new_from_json_tweet = twitter.models.Status.NewFromJsonDict(as_dict_output)
print("tweet deserialized by NewFromJson doesn't have media:")
twitter.models.Status.NewFromJsonDict(as_dict_output)
print(new_from_json_tweet.media)

kwargs_tweet = twitter.models.Status(**as_dict_output)
print("tweet created by passing the dict instead of kwargs has media:")
print(kwargs_tweet.media)
print("but it's just a list of dicts, not a list of twitter.models.Media instances")

Looking at the serialization/deserialization code in models.py it seems the issue is caused by the serialization code putting the dicts representing the media instances to key called media but then in NewFromJsonDict() it's looking for media in a dict in the "entities" or "extended_entities" key, not in the toplevel dict namespace:

        if 'entities' in data:
            if 'urls' in data['entities']:
                urls = [Url.NewFromJsonDict(u) for u in data['entities']['urls']]
            if 'user_mentions' in data['entities']:
                user_mentions = [User.NewFromJsonDict(u) for u in data['entities']['user_mentions']]
            if 'hashtags' in data['entities']:
                hashtags = [Hashtag.NewFromJsonDict(h) for h in data['entities']['hashtags']]
            if 'media' in data['entities']:
                media = [Media.NewFromJsonDict(m) for m in data['entities']['media']]

        # the new extended entities
        if 'extended_entities' in data:
            if 'media' in data['extended_entities']:
                media = [Media.NewFromJsonDict(m) for m in data['extended_entities']['media']]

So these possible solutions come to me mind:

  • AsDict() should place the list of media dicts to ["entities"]["media"] or ["extended_entitites"]["media"] instead to the toplevel "media" key
  • NewFromJson() should look for media also in the top-level "media" key

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions