# Create api_example cache

So the goal is to iterate through all the API calls available and produce an output.

To do this, we will need valid data for each one.

Is the easiest way to do it, create a list of the valid API calls.
And then have a list of methods and iterate over those methods?

It's less effecient, as it will require a few api calls to extract the valid data... but it might not be a problem.

Lets build it that way first and see.  What are the steps needed to do this?

## 1 API QUERY EXTRACTOR

1 - a list of required valid api queries
2 - a plan for getting each one, with the least api calls as possible

Okay, lets work on that.

Here's the api calls grouped by their type

- Trendingpodcasts - nothing
- Recentepisodes - nothing

- Search - random string
- Episodesbyperson - famous person

- feed url - Podcastbyfeedurl, Episodesbyfeedurl
- feed id - Podcastbyfeedid, Episodesbyfeedid
- itunes id - Podcastbyitunesid, Episodesbyitunesid
 
- episode id - Episodebyid

We can get from **trendingpodcasts** to get *feed url*, *feed id* and *itunes id*.

Should we use that for a person and string?  No, lets keep that simple 'python' and 'Guido van Rossum'

So that leaves a second API call to one of the episodesby results I think

We need to check the returned valid for them all is valid - and if it's not then we skip to the next result in the dictionary. 

Okay, so this part will work with:

- a) index instance or api wrapper
- b) trending podcasts call
- c) iterate over list of dictionaries
- d) check if the response are valid
- e) if they are valid, break the loop

So lets work with the recents data we have already for now.

### Creating the valid query data function

#### Validate ids/url from TRENDING

In [None]:
import re

feed_id = 555343
itunes_id = 42441
feed_url = 'https://feeds.twit.tv/twit.xml'

feed_id1 = 'null'
itunes_id1 = 'null'
feed_url1 = 'www.podcast.com'


def is_valid_id(*args):
    for arg in args:
        if not isinstance(arg, int) or arg is None:
            return False
    return True 

def is_valid_url(url):
    url_pattern = re.compile(
        r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
    )
    if not isinstance(url, str) or not url_pattern.match(url):
        return False
    else:
        return True

#print(type(feedid))

# if is_valid_id(feedid, itunesid):
#     print("yes")
# else:
#     print("nope")

# if is_valid_url(url):
#     print("yes")
# else:
#     print("nope")

if is_valid_id(feed_id, itunes_id) and is_valid_url(feed_url1):
    valid_queries = {'feedid': feed_id, 'itunesid': itunes_id, 'url': feed_url}
    print(valid_queries)
    print("All is true")
else:
    print("Back to the drawing board")


#### Get data from TRENDING (json)

In [None]:
import json
import re
from typing import Dict

def is_valid_id(*args):
    for arg in args:
        if not isinstance(arg, int) or arg is None:
            return False
    return True 

def is_valid_url(url):
    url_pattern = re.compile(
        r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
    )
    if not isinstance(url, str) or not url_pattern.match(url):
        return False
    else:
        return True

def get_valid_podcast_queries():

    with open('documenting_pi_api/payload_examples/011_trendingPodcasts.json', 'r') as file:
        payload = json.load(file)
   
    podcasts = payload['feeds']

    for podcast in podcasts:
        valid_queries = "No valid query set found"
    
        feed_id = podcast['id']
        itunes_id = podcast['itunesId']
        feed_url = podcast['url']
        if is_valid_id(feed_id, itunes_id) and is_valid_url(feed_url):
            valid_queries = {'feedid': feed_id, 'itunesid': itunes_id, 'url': feed_url}
    
    return valid_queries

print(get_valid_podcast_queries())



#### CLASS - Get data from TRENDING (json)

In [None]:
import json
import re

class GetValidQueriesFromAPI:
    '''
    A class to extract valid queries from trending and episodesById api calls for use by CacheAPIOutputs.

    It extracts feed_id, itunes_id and feed_url from an API call to trending podcasts, returning the results from the first podcast 
    with three valid outputs for those fields.

    It then uses feed_id for an api call to episodesByFeedId to get a valid episode id. 
    '''

    def __init__(self):
        self.valid_queries = {}

    def is_valid_id(self, *args):
        for arg in args:
            if not isinstance(arg, int) or arg is None:
                return False
        return True

    def is_valid_url(self, url):
        url_pattern = re.compile(
            r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
        )
        if not isinstance(url, str) or not url_pattern.match(url):
            return False
        else:
            return True

    def get_podcast_fields(self):
        with open('documenting_pi_api/payload_examples/011_trendingPodcasts.json', 'r') as file:
            payload = json.load(file)

        podcasts = payload['feeds']

        for podcast in podcasts:
            self.valid_queries = "No valid query set found"

            feedId = podcast['id']
            itunesId = podcast['itunesId']
            feed_url = podcast['url']

            if self.is_valid_id(feedId, itunesId) and self.is_valid_url(feed_url):
                self.valid_queries = {'feedid': feedId, 'itunesid': itunesId, 'url': feed_url}

### What's next

So we've now got this dictionary

valid_ids = {'feed_id': 555343, 'itunes_id': 42441, 'feed_url': 'https://rss.acast.com/findingannie'}

We now need the remaining valid ids - and perhaps we should this dict like this:

valid_ids = {'search_term': 'python, 'person': 'Guido van Rossum'}

Then we'd combine the two.  So the idea would be to build this out - and by the end of this section, we have the whole thing.

REMINDER OURSELVES

- [x] Recentepisodes, Trendingpodcasts - nothing
- [x] Search - random string
- [x] Episodesbyperson - famous person
- [x] feed url - Podcastbyfeedurl, Episodesbyfeedurl
- [x] feed id - Podcastbyfeedid, Episodesbyfeedid
- [x] itunes id - Podcastbyitunesid, Episodesbyitunesid
- [ ] episode id - Episodebyid

So we just need episode id.  

This really gets into a bigger structural question doesn't it.

Where will this data exist?

### Get data from EPISODE BY FEEDID

In [89]:
def is_valid_id(*args):
    for arg in args:
        if not isinstance(arg, int) or arg is None:
            return False
        return True


def get_episode_field():
        with open('documenting_pi_api/payload_examples/006_episodesByFeedId.json', 'r') as file:
            payload = json.load(file)

        # print(payload)
        episodes = payload['items']
        print(episodes)

        for episode in episodes:
            episode_id = episode['id']
            print(episode_id)
            if is_valid_id(episode_id):
                 return episode_id 
        else:
            return "No valid episode_ids"  

print(get_episode_field())


[{'id': 15039245759, 'title': 'Episode 188 – Money, Money, Money: TFAL talks with Berna Anat, your Financial Hype Woman', 'link': 'https://thisfilipinoamericanlife.com/2023/04/21/episode-188-money-money-money-tfal-talks-with-berna-anat-your-financial-hype-woman/', 'description': '<p></p>\n<p>Did you know April is financial awareness month? A perfect time to drop this episode with MONEY OUT LOUD author Berna Anat. Berna is your Financial Hype Woman/Manang/Ate/Tita. The moniker “Financial Hype Woman” is her made up way of saying she creates financial education media that lives at @HeyBerna all over the internet. Listen as the crew talks about their own journeys with financial planning, dealing with money, and how that applies to our Filipino American family dynamics. </p>\n<p>', 'guid': 'https://thisfilipinoamericanlife.com/?p=5783', 'datePublished': 1682100004, 'datePublishedPretty': 'April 21, 2023 1:00pm', 'dateCrawled': 1682130917, 'enclosureUrl': 'https://thisfilipinoamericanlife.co

### Add EPISODE FEED ID TO COMPLETE CLASS

In [97]:
import json
import re

class GetValidQueriesFromAPI:
    '''
    A class to extract valid queries from trending and episodesById api calls for use by CacheAPIOutputs.

    It extracts feed_id, itunes_id and feed_url from an API call to trending podcasts, returning the results from the first podcast 
    with three valid outputs for those fields.

    It then uses feed_id for an api call to episodesByFeedId to get a valid episode id. 
    '''

    def __init__(self):
        self.valid_queries = {}

    def is_valid_id(self, *args):
        for arg in args:
            if not isinstance(arg, int) or arg is None:
                return False
        return True

    def is_valid_url(self, url):
        url_pattern = re.compile(
            r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
        )
        if not isinstance(url, str) or not url_pattern.match(url):
            return False
        else:
            return True

    def get_podcast_fields(self):
        with open('documenting_pi_api/payload_examples/011_trendingPodcasts.json', 'r') as file:
            payload = json.load(file)

        podcasts = payload['feeds']

        for podcast in podcasts:
            self.valid_queries = "No valid query set found"

            feed_id = podcast['id']
            itunes_id = podcast['itunesId']
            feed_url = podcast['url']

            if self.is_valid_id(feed_id, itunes_id) and self.is_valid_url(feed_url):
                self.valid_queries = {'feed_id': feed_id, 'itunes_id': itunes_id, 'feed_url': feed_url}
                return self.valid_queries
            
    def get_episode_field(self):
            with open('documenting_pi_api/payload_examples/006_episodesByFeedId.json', 'r') as file:
                payload = json.load(file)

            episodes = payload['items']
            
            for episode in episodes:
                episode_id = episode['id']
                if self.is_valid_id(episode_id):
                    self.valid_queries['episode_id'] = episode_id
                    return self.valid_queries
            else:
                return "No valid episode_ids" 
            

get_queries = GetValidQueriesFromAPI()
get_queries.get_podcast_fields()
get_queries.get_episode_field()

print(get_queries.valid_queries)




{'feed_id': 555343, 'itunes_id': 73329404, 'feed_url': 'https://feeds.twit.tv/twit.xml', 'episode_id': 15039245759}


### Work from json? NO, LETS NOT

So, I could download the two necessary json and work from them.

But, I think that's more hassle - it's two API calls that need to be done.

I think we should just go with the original intention - we should build separate functionality for working with the JSON files, if that's what we want to do.

In which case, we just need to update the class with the API calls rather than the JSON calls?

### Replacing JSON in the TRENDING call

Going to try and do this live in the full class I think.

In [10]:
import json
import re

from avoidable_api_wrapper import PodcastIndexConfig, PodcastIndexAPI

class APIQueryExtractor:
    '''
    A class to create a dictionary of valid queries (strings and ids) required for each available API call in python-podcastindex.

    For use by APICacher to create a cached JSON of each API call's output.
    
    The class can extract feed_id, itunes_id, and feed_url from a payload of trending podcasts, returning the results from the first podcast 
    with three valid outputs for those fields.

    It can also extract episode_id from a payload of recent episodes, returning the result from the first episode with a valid output for 
    that field. 

    Two constants: SEARCH_TERM and SEARCH_PERSON complete the required query variables.
    '''

    SEARCH_TERM = 'python'
    SEARCH_PERSON = 'Guido van Rossum'
    MAX = 10 

    def __init__(self):
        self.valid_queries = {'search_term': self.SEARCH_TERM, 'person': self.SEARCH_PERSON, 'max': self.MAX}

    def get_all_valid_fields(self):
        self.get_podcast_fields()
        self.get_episode_field()
        return self.valid_queries

    def get_podcast_fields(self):
        # with open('documenting_pi_api/payload_examples/011_trendingPodcasts.json', 'r') as file:
        #     payload = json.load(file)
        payload = api_instance.index.trendingPodcasts(10)

        podcasts = payload['feeds']

        for podcast in podcasts:
            feed_id = podcast['id']
            itunes_id = podcast['itunesId']
            feed_url = podcast['url']

            if self.is_valid_id(feed_id, itunes_id) and self.is_valid_url(feed_url):
                self.valid_queries.update ({'feed_id': feed_id, 'itunes_id': itunes_id, 'feed_url': feed_url})
                return self.valid_queries
        else:
            return "No valid data" 
            
    def get_episode_field(self):
        # with open('documenting_pi_api/payload_examples/006_episodesByFeedId.json', 'r') as file:
        #     payload = json.load(file)
        payload = api_instance.index.recentEpisodes(10)
        
        episodes = payload['items']
        
        for episode in episodes:
            episode_id = episode['id']
            if self.is_valid_id(episode_id):
                self.valid_queries['episode_id'] = episode_id
                return self.valid_queries
        else:
            return "No valid data" 

    def is_valid_id(self, *args):
        for arg in args:
            if not isinstance(arg, int) or arg is None:
                return False
        return True

    def is_valid_url(self, url):
        url_pattern = re.compile(
            r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
        )
        if not isinstance(url, str) or not url_pattern.match(url):
            return False
        else:
            return True
            
config_instance = PodcastIndexConfig()
api_instance = PodcastIndexAPI(config_instance.config)

valid_query_strings = APIQueryExtractor()
valid_query_strings.get_all_valid_fields()
print(valid_query_strings.valid_queries)


{'search_term': 'python', 'person': 'Guido van Rossum', 'max': 10, 'feed_id': 227573, 'itunes_id': 1325018583, 'feed_url': 'https://podnews.net/rss', 'episode_id': 15137129775}


### Working!

So it seems to be working completely fine!

I just need to go back and add these valid_ids = {'search_term': 'python, 'person': 'Guido van Rossum'}

FINAL CHECK

- [/] Recentepisodes, Trendingpodcasts - nothing
- [x] Search - random string
- [x] Episodesbyperson - famous person
- [x] feed url - Podcastbyfeedurl, Episodesbyfeedurl
- [x] feed id - Podcastbyfeedid, Episodesbyfeedid
- [x] itunes id - Podcastbyitunesid, Episodesbyitunesid
- [x] episode id - Episodebyid

OKAY! So we've now added the constants, which was a bit harder than I expected, but, anyway, it's done, it's there. We have something working!

### Names ok?

So we've got: 

valid_queries = APIQueryStringExtractor()
valid_queries.get_all_valid_fields()

examples_caches = APIOutputsCache(valid_queries)

Lets come back to this when I'm feeling a bit more fresh.

