Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create enriched relatedBills JSON #7

Closed
aih opened this issue Aug 26, 2020 · 9 comments
Closed

Create enriched relatedBills JSON #7

aih opened this issue Aug 26, 2020 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@aih
Copy link
Collaborator

aih commented Aug 26, 2020

This is an extension of #6 and should be combined into PR #5

Generate a JSON that contains rich information about bill similarity. For each bill, there will be a list of objects for related bills. The JSON will have billnumbers as the keys, and the value will be an array of objects corresponding to related bills. In each object will be information about what the two bills share. So, for example:

116s130: [
  { billCongressTypeNumber: '116hr201'
    cosponsors: [bioguide_id1, bioguide_id2],
    titles: ['Shared Title 1', 'Shared Title 2', etc.]
    similar_title: ['Similar (nonidentical) Title 1', 'Similar (nonidentical) Title 2', etc.]
  }...
  ],
]
@aih
Copy link
Collaborator Author

aih commented Aug 26, 2020

To start, create the JSON above with only the billCongressTypeNumber and titles array:

116s130: [
  { billCongressTypeNumber: '116hr201'
    titles: ['Shared Title 1', 'Shared Title 2', etc.]
  }...
  ],
]

@aih aih added the enhancement New feature or request label Aug 26, 2020
@adamwjo
Copy link
Contributor

adamwjo commented Aug 26, 2020

I'm a little confused about the billCongressTypeNumber. This top level key is also a billCongressTypeNumber right? Where is that value coming from?

As for the titles, I imagine I can use the titles index for this? Or would it make sense to use the billsmeta.json?

@aih
Copy link
Collaborator Author

aih commented Aug 26, 2020

This is an extension of what you've done in sameTitles.json to support other kinds of similarity measures. The reason to do this is that we can then ask 1) what are all of the bills that are related to 116hjres56 and 2) how are they related (what titles are exactly the same, then later what titles are almost the same, what cosponsors do they share, etc.

In sameTitles.json we have:

{
"116hjres58": {"same_titles": ["116hjres58"]}, 
...
"116hjres56": {"same_titles": ["116hjres56", "115hjres142"]}, 
...
}

This will become:

{
"116hjres58": [{ 
                         billCongressTypeNumber: "116hjres58",
                         titles: ["Title 1", "Title 2"] // in this case, it is all of the titles of this bill, since this is the 'identity' item.
                         }], 
...
"116hjres56": [ {
                           billCongressTypeNumber: "116hjres56",
                           titles: ["Title of this bill", "Another title of this bill"] // again, this is the 'identity' item for 116hjres56
                          }, 
                          {
                           billCongressTypeNumber: "115hjres142",
                           titles: ["Shared Title 1", "Shared Title 2"] // this is the list of titles that are common between 116hjres56 and 115hjres142
                       ],
...
}

@adamwjo
Copy link
Contributor

adamwjo commented Aug 26, 2020

So far I'm up to this:

def getSameTitles():
    titlesIndex = loadTitlesIndex()
    sameTitlesIndex = {}
    for title, bills in titlesIndex.items():
        for bill in bills:
            if not sameTitlesIndex.get(bill):
                sameTitlesIndex[bill] = []
                for bill in bills:
                    billObj = {
                        'billCongressTypeNumber': bill,
                        'titles': [title]
                    }
                    objlist = sameTitlesIndex.get(bill, [])
                    objlist.append(billObj)
            # else:
            #     current_same_titles = sameTitlesIndex[bill].get('same_titles')
            #     if current_same_titles:
            #         combined_bills = list(set(current_same_titles + bills))
            #         sameTitlesIndex[bill]['same_titles'] = combined_bills

I think I am running into the same problem where I am creating a new object for every bill. Here's my logic so far:

  • Check the sameTitles index for the bill num
  • If the bill num is not present create a new key with that bill num with the value being a list
  • loop over the bills and create an object for each billnum containing the following, and append to the list
{ 
                         billCongressTypeNumber: "116hjres58",
                         titles: ["Title 1", "Title 2"] 
                         }

-if the bill number is present, append its title to the titles list

@aih
Copy link
Collaborator Author

aih commented Aug 26, 2020

Start with the billsMeta.json document, which has the basic structure you need: a Dict with all of the bill numbers.

To do this, you should from billdata import loadBillsMeta.

Then make your own that looks like this and save it to relatedBills.json:

{
   "116hjres58": [],
   "116hjres56": []
}

You can do this with a Dict comprehension:

billsRelated = {key: [] for key in billMeta.keys()}

Start by saving that Dict (with empty arrays as values) into billsRelated.json.

@adamwjo
Copy link
Contributor

adamwjo commented Aug 26, 2020

Ahhh, I see. Then loop over the titlesIndex, creating an object for each of the related billnums and then append it to the list for that billnum?

@aih
Copy link
Collaborator Author

aih commented Aug 26, 2020

Yes. You can do it with the loops you have above, but starting with a full list of bills can help you think about what is happening in each loop. You'll still have to handle multiple bill numbers in multiple titles.

Think through what is happening at each stage. Work it out on paper with a small sample set of titlesIndex.json:

"Science Appropriations Act, 2019": ["116hjres31", "116hr21", "116hr648", "115s3072", "115hr5952"], 
"Transportation, Housing and Urban Development, and Related Agencies Appropriations Act, 2019": ["116hjres31", "116hr21", "116hr267", "116hr648", "115s3023", "115hr6147", "115hr6072"], 
"Making further continuing appropriations for the Department of Homeland Security for fiscal year 2019, and for other purposes.": ["116hjres31", "116hjres1"],

What happens:

  1. in the first for bill in bills: loop, when sameTitlesIndex.get('116hjres31') is an empty array?

  2. What is the result of both loops when bill is 116hjres31? (Also, definitely rename bill in the second loop to bill2 or you will confuse yourself and Python with the variable.

  3. What is the result of the first loop on the second item (116hr21)?

@adamwjo
Copy link
Contributor

adamwjo commented Aug 26, 2020

Made an PR with the updated logic. I believe it's most of the way there, so appreciative of your help!

@aih aih changed the title Create enriched relatedBills JSon Create enriched relatedBills JSON Aug 31, 2020
@aih
Copy link
Collaborator Author

aih commented Sep 2, 2020

I refactored the function to create relatedBills.json in one pass, with an outer and inner loop. My comments on the changes are here: #8

Closing.

@aih aih closed this as completed Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants