### Capstone Idea

[GitHub](https://github.com/biborsz/Capstone)

**Problem Statement:** 

Beta.SAM.gov is the successor, among other federal governmental websites, of fedbizops.gov. The part that interests me is that it has a searchable collection of federal contract solicitations awards, as well as an information system of past grants. The current system is not yet fully operational. Nonetheless, it allows to search the collection based on key terms, grant id, etc. One of it's problems in the past was, that while it was searchable, it did not have a recommender system. Businesses, especially small businesses had to spend a considerable amount of time to find relevant grant opportunities. This new system offers that. Creating an account will have the advantage of receiving updates. However, the downside of email updates is that it clogs up the email inbox, and it does not always generate the expected value:
- it is difficult to know whether there are more opportunities out there or just the ones about which a business gets notified
- reader fatigue may cause businesses not to look further for contract opportunities than what had already been sent to them - and thus leave potential fundings on the table
- someone still needs to sift through a large amount of potentially irrelevant federal grant descriptions

All in all: a user controlled recommender system would enhance the effectiveness of grant searching for businesses that do not have lots of resources to allocate toward that activity in the first place. 

How it would work:

- based on archival data - it would search for similarities in ## not possible - archived data does not contain this information
   - business activity of applicants
   - name of funding agency
   
   - earlier search terms
   - successful earlier grant applications

-> it would give a list of potentially useful search terms, with an option to click on a select few

-> based on the search terms selected - it will further tune the recommender system

-> businesses have an option of rating a recommendation up or down

- based on earlier search terms - one would have access to a longer list of opportunites - that would make browsing possible (right now - it looks to me - that is out of the question)

- companies could search for other relevant information - for example: which companies are applying for similar grants in their business and geographic area
   
   
[API documentation](https://open.gsa.gov/api/get-opportunities-public-api/#user-account-api-key-creation)

[Beta.SAM.gov](https://beta.sam.gov/)

**Methodology:**
   - content based recommender system
   - text vectorizer: *bag-of-words*, *one-hot-encoder* 
     - bag of words: extracts words from the corpus as features
     - one-hot-encoder: gives a value of one if it finds a feature within a row/ text (0 to all other features)
   - classifies/ recommends based on *cosine similarities*

**Sources:**

https://towardsdatascience.com/how-to-build-a-simple-recommender-system-in-python-375093c3fb7d

https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/ch04.html

http://recommender-systems.org/content-based-filtering/

https://heartbeat.fritz.ai/recommender-systems-with-python-part-i-content-based-filtering-5df4940bd831

**Ideas for the future:**
 - find database of registered businesses (business activity/ line of business)
 - crossreference awardees in prior grants to provide a list of competitors

In [1]:
# imports
import pandas as pd
import numpy as np
import requests
import time
import datetime

In [2]:
# set display options 
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', None)

In [None]:
# rewrite code - to read in combined file - 
# separete out dates 
# split string '-'
# turn them into datetime object
# check for oldest date
# turn back into string
# make that postedTo date
# have for loop pull another page
# transform json object
# concat it to existing dataframe
# save it to csv
# I am at 03/02/2020

In [3]:
# define function to return minimum postedDate as string
def get_min_post_date(filename):
    # read in file of downloaded contract opportunities
    df = pd.read_csv(filename)
    # get min posted date
    min_date = pd.to_datetime(df['postedDate']).min()
    return min_date.strftime('%m/%d/%Y')

In [4]:
# define function to return maximum postedDate as string
def get_max_post_date(filename):
    # read in file of downloaded contract opportunities
    df = pd.read_csv(filename)
    # get min posted date
    min_date = pd.to_datetime(df['postedDate']).max()
    return min_date.strftime('%m/%d/%Y')

In [5]:
# define function to return today's date as string
def today():
    now = datetime.datetime.now()
    return now.strftime('%m/%d/%Y')

In [6]:
today()

'05/29/2020'

In [4]:
get_max_post_date('./data/combined.csv')

'05/14/2020'

In [7]:
# pull contract information from api.sam.gov

# set base url
url = 'https://api.sam.gov/prod/opportunities/v1/search'

# create empty list to store results
result = []
# initialize counter
count = 0
# downloaded contract opportunities - file name
# file = './data/combined.csv'

# set postedTo date to today's date by calling today() function
postedTo = today()

# set posted from date
postedFrom = get_max_post_date('./data/combined.csv')

# for loop to pull contracts
for i in range(4):
    count += 1
    
    # do a get request
    req = requests.get(url,
                      params={
                          'api_key': '',
                          'postedFrom': postedFrom,
                          'postedTo': postedTo,
                          'limit': 1000,
                          'offset': (count-1) * 1000
                      })
    
    # add response to result list
    result.append(req)
    
    now = datetime.datetime.now()
    print('Time:', now.strftime("%Y-%m-%d %H:%M:%S"))
    time.sleep(5)
    
    
    
# source for datetime - https://www.w3resource.com/python-exercises/python-basic-exercise-3.php

Time: 2020-05-29 14:53:47
Time: 2020-05-29 14:54:18
Time: 2020-05-29 14:54:48
Time: 2020-05-29 14:55:05


In [8]:
# unpack list of json objects from response data
ops = []
for item in result:
    print(item.headers)
    ops.append(item.json())

{'Age': '25', 'Content-Type': 'application/hal+json', 'Date': 'Fri, 29 May 2020 18:53:47 GMT', 'Server': 'openresty', 'Vary': 'Origin, Access-Control-Request-Method, Access-Control-Request-Headers', 'Via': 'http/1.1 api-umbrella (ApacheTrafficServer [cMsSf ])', 'X-Cache': 'MISS', 'X-Forwarded-For': '74.96.156.35, 10.177.16.72, 10.177.55.40, 10.177.55.40', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Set-Cookie': 'citrix_ns_id=LmIrIOAeke8D9noZKI9SNWLoQ8I0002; Domain=.sam.gov; Path=/; Secure; HttpOnly', 'Cache-Control': 'private', 'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked'}
{'Age': '26', 'Content-Type': 'application/hal+json', 'Date': 'Fri, 29 May 2020 18:54:18 GMT', 'Server': 'openresty', 'Vary': 'Origin, Access-Control-Request-Method, Access-Control-Request-Headers', 'Via': 'http/1.1 api-umbrella (ApacheTrafficServer [cMsSf ])', 'X-Cache': 'MISS', 'X-Forwarded-For': '74.96.156.35, 10.177.16.72, 10.177.52.13, 1

In [9]:
# parse json objects
ls_data = []
for i in range(len(ops)):
    print(ops[i].keys())
    df = pd.DataFrame(ops[i]['opportunitiesData'])
    ls_data.append(df)
    data = pd.concat(ls_data)

dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])


In [10]:
data.shape

(3532, 27)

In [11]:
data.tail(10)

Unnamed: 0,noticeId,title,solicitationNumber,department,subTier,office,postedDate,type,baseType,archiveType,archiveDate,typeOfSetAsideDescription,typeOfSetAside,responseDeadLine,naicsCode,classificationCode,active,award,pointOfContact,description,organizationType,officeAddress,placeOfPerformance,additionalInfoLink,uiLink,links,resourceLinks
522,026e9523c1c048b5a61d1c59914e1720,Amendment 0003 DDWG Tension Fabric Structures Overhead Doors Replacement,SP330020Q0061,DEPT OF DEFENSE,DEFENSE LOGISTICS AGENCY (DLA),DLA DISTRIBUTION,2020-05-27,Combined Synopsis/Solicitation,Combined Synopsis/Solicitation,auto15,2020-06-24,Total Small Business Set-Aside (FAR 19.5),SBA,2020-06-09T15:30:00-04:00,332311.0,J099,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'tameka.drayton@dla.mil', 'phone': '7177706523', 'title': None, 'fullName': 'Tameka L. Drayton'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=026e9523c1c048b5a61d1c59914e1720,OFFICE,"{'zipcode': '17070-5008', 'city': 'NEW CUMBERLAND', 'countryCode': 'USA', 'state': 'PA'}","{'city': {'code': '80508', 'name': 'Warner Robins'}, 'state': {'code': 'GA', 'name': 'Georgia'}, 'zip': '31098', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/026e9523c1c048b5a61d1c59914e1720/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=026e9523c1c048b5a61d1c59914e1720&limit=1'}]","[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/8ea4b24d158045eb98d9943044c476cd/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/431966cce4684be89be9695260570529/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/5a3eef34e9e048528cd9b08bb0c01ef1/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/057e1ff64dda4c5488370d5c025eec93/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/85a36af3eff243faa7789932c0446098/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/db003c83501a4d22b89d92cfb9cc7047/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/18a30f8b10f7476e8c2ae87f555ee6d9/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/e53f6fedc7f74f3e816ab72cc3ad9492/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/7be690a096514fe69adf4603e8d68478/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/a8178e3840644402a752fd147190dd70/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/219cfa51a0114a549b01a35615379de1/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/2a98a1f8d28b4f5e86a2eed7cfa5f0c1/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/cd5bef2b182b41d48d1f42e44ca558de/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/395f1fad998b4dd6b5f3e26797b629dc/download?api_key=null&token=]"
523,0265256f47264de8b12634ffad11c53c,Personnel Safety System Conduit Systems and Cables Installation,RFP20-068C,"ENERGY, DEPARTMENT OF","ENERGY, DEPARTMENT OF",PRINCETON LAB - DOE CONTRACTOR,2020-05-27,Solicitation,Solicitation,autocustom,2020-07-31,,,2020-06-30T16:00:00-04:00,238210.0,Z1HA,Yes,,"[{'fax': '6092432021', 'type': 'primary', 'email': 'ccheung@pppl.gov', 'phone': '6092433092', 'title': None, 'fullName': 'Chi Man Cheung'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=0265256f47264de8b12634ffad11c53c,OFFICE,"{'zipcode': '08540', 'city': 'Princeton', 'countryCode': 'USA', 'state': 'NJ'}","{'city': {'code': '60900', 'name': 'Princeton'}, 'state': {'code': 'NJ', 'name': 'New Jersey'}, 'zip': '08543', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/0265256f47264de8b12634ffad11c53c/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=0265256f47264de8b12634ffad11c53c&limit=1'}]",
524,026020b9dc6e4ddcae615043fc147248,NIPR/SIPR connection via LTE/4G/3G cellular networks.,FA4620,DEPT OF DEFENSE,DEPT OF THE AIR FORCE,FA4620 92 CONS LGC,2020-05-27,Sources Sought,Sources Sought,auto15,2020-06-25,,,2020-06-10T13:00:00-07:00,,,Yes,,"[{'fax': '5092478685', 'type': 'primary', 'email': 'edward.campbell.2@us.af.mil', 'phone': '5092474877', 'title': None, 'fullName': 'Ed Campbell'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=026020b9dc6e4ddcae615043fc147248,OFFICE,"{'zipcode': '99011-5320', 'city': 'FAIRCHILD AFB', 'countryCode': 'USA', 'state': 'WA'}","{'city': {'code': 'WA-02', 'name': 'Fairchild AFB'}, 'state': {'code': 'WA', 'name': 'Washington'}, 'zip': '99011', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/026020b9dc6e4ddcae615043fc147248/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=026020b9dc6e4ddcae615043fc147248&limit=1'}]",
525,021aaf39cef440338825536d90daf668,"28--SUPPORT,ANIT ICING",SPE4A720T191Z,DEPT OF DEFENSE,DEFENSE LOGISTICS AGENCY (DLA),DLA AVIATION,2020-05-27,Combined Synopsis/Solicitation,Combined Synopsis/Solicitation,autocustom,2020-07-04,,,2020-06-04,336412.0,28,Yes,{'awardee': {'location': {}}},"[{'fax': None, 'type': 'primary', 'email': 'DibbsBSM@dla.mil', 'phone': None, 'title': None, 'fullName': 'Questions regarding this solicitation should be emailed to the buyer listed in block 5 of the solicitation document which can be found under the Additional Information link. If the Additional Information link does not work, please go to https://www.dibbs.'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=021aaf39cef440338825536d90daf668,OFFICE,"{'zipcode': '23297', 'city': 'RICHMOND', 'countryCode': 'USA', 'state': 'VA'}",{},,https://beta.sam.gov/opp/021aaf39cef440338825536d90daf668/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=021aaf39cef440338825536d90daf668&limit=1'}]",
526,01ffcdfa61754634982213b877f393f6,Chemical Purchase,F2KBAA0098A102,DEPT OF DEFENSE,DEPT OF THE AIR FORCE,FA9453 DET 8 AFRL PKV8,2020-05-27,Combined Synopsis/Solicitation,Combined Synopsis/Solicitation,auto15,2020-06-13,Total Small Business Set-Aside (FAR 19.5),SBA,2020-05-29T16:00:00-06:00,325998.0,6810,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'robert.jefferis.1@us.af.mil', 'phone': '', 'title': None, 'fullName': 'Robert Peter Jefferis'}, {'fax': None, 'type': 'secondary', 'email': 'jeffery.martinez.1@us.af.mil', 'phone': None, 'title': None, 'fullName': 'Jeffery Martinez'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=01ffcdfa61754634982213b877f393f6,OFFICE,"{'zipcode': '87117', 'city': 'KIRTLAND AFB', 'countryCode': 'USA', 'state': 'NM'}","{'city': {'code': '21586', 'name': 'Edwards'}, 'state': {'code': 'CA', 'name': 'California'}, 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/01ffcdfa61754634982213b877f393f6/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=01ffcdfa61754634982213b877f393f6&limit=1'}]",[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/cbd9642be1c14047ada78f954c895bc7/download?api_key=null&token=]
527,018f63b33b894e77a679de7d44a7dac6,"PAD, RUBBER",N0010420QAE07,DEPT OF DEFENSE,DEPT OF THE NAVY,NAVSUP WEAPON SYSTEMS SUPPORT MECH,2020-05-27,Solicitation,Solicitation,auto15,2020-07-13,Total Small Business Set-Aside (FAR 19.5),SBA,2020-06-28T16:30:00-04:00,336415.0,1440,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'JOSHUA.D.MARTIN4@NAVY.MIL', 'phone': '717-605-4356', 'title': None, 'fullName': 'Telephone: 717-605-4356'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=018f63b33b894e77a679de7d44a7dac6,OFFICE,"{'zipcode': '17050-0788', 'city': 'MECHANICSBURG', 'countryCode': 'USA', 'state': 'PA'}",,,https://beta.sam.gov/opp/018f63b33b894e77a679de7d44a7dac6/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=018f63b33b894e77a679de7d44a7dac6&limit=1'}]",
528,018a673b2bc84ceab934818f735e5127,Cabling Services,36C26020Q0415,"VETERANS AFFAIRS, DEPARTMENT OF","VETERANS AFFAIRS, DEPARTMENT OF",260-NETWORK CONTRACT OFFICE 20 (36C260),2020-05-27,Presolicitation,Presolicitation,autocustom,2020-09-22,Service-Disabled Veteran-Owned Small Business (SDVOSB) Set-Aside (FAR 19.14),SDVOSBC,2020-06-22T17:00:00-07:00,238210.0,D301,Yes,,"[{'fax': None, 'type': 'primary', 'email': 'laura.davis5@va.gov', 'phone': None, 'title': None, 'fullName': 'Laura Davis'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=018a673b2bc84ceab934818f735e5127,OFFICE,"{'zipcode': '98662', 'city': 'VANCOUVER', 'countryCode': 'USA', 'state': 'WA'}",,,https://beta.sam.gov/opp/018a673b2bc84ceab934818f735e5127/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=018a673b2bc84ceab934818f735e5127&limit=1'}]",
529,0169b4e38f524d0fa5e7ef50f39300f8,J041--Ice Machines and Retherm Unit Service Contract,36C24220Q0669,"VETERANS AFFAIRS, DEPARTMENT OF","VETERANS AFFAIRS, DEPARTMENT OF",242-NETWORK CONTRACT OFFICE 02 (36C242),2020-05-27,Sources Sought,Sources Sought,autocustom,2020-07-28,,,2020-05-29T14:00:00-04:00,811310.0,J041,Yes,{'awardee': {}},"[{'fax': '', 'type': 'primary', 'email': 'natalie.acevedo@va.gov', 'phone': '718 584 9000 x4356', 'title': 'Natalie Acevedo', 'fullName': 'Natalie Acevedo'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=0169b4e38f524d0fa5e7ef50f39300f8,OFFICE,"{'zipcode': '12208', 'city': 'ALBANY', 'countryCode': 'USA', 'state': 'NY'}","{'streetAddress': 'New Jersey VA Healthcare System East Orange and Lyons Campus', 'streetAddress2': '', 'city': {'name': ''}, 'state': {'name': ''}, 'zip': '', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/0169b4e38f524d0fa5e7ef50f39300f8/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=0169b4e38f524d0fa5e7ef50f39300f8&limit=1'}]",[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/a5f2499245a945ca8d8432cf72f969b4/download?api_key=null&token=]
530,0132333ed0c740d1b7d239644aa563e9,"70--OPTICAL READER,DATA",SPE8EN20T2793,DEPT OF DEFENSE,DEFENSE LOGISTICS AGENCY (DLA),DLA TROOP SUPPORT,2020-05-27,Award Notice,Award Notice,auto15,2020-06-11,,,,334118.0,70,Yes,"{'date': '2020-05-27', 'number': 'SPE8EN20P0653', 'amount': '234725.00', 'awardee': {'name': 'Avidcard, Inc. 9225 Dowdy Dr Ste 210 San Diego 92126-6367 US', 'location': {}, 'duns': '796603095', 'cageCode': '4QJY2'}}","[{'fax': None, 'type': 'primary', 'email': 'Janet.Linsky@dla.mil', 'phone': None, 'title': None, 'fullName': 'Janet LINSKY215-737-2162'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=0132333ed0c740d1b7d239644aa563e9,OFFICE,"{'zipcode': '19111-5096', 'city': 'PHILADELPHIA', 'countryCode': 'USA', 'state': 'PA'}",{},,https://beta.sam.gov/opp/0132333ed0c740d1b7d239644aa563e9/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=0132333ed0c740d1b7d239644aa563e9&limit=1'}]",
531,00fd71c7ddb64f208923a690cedae2d5,Joint Light Tactical Vehicle (JLTV) Family of Vehicles (FoV) Follow-On,W56HZV-20-R-0072,DEPT OF DEFENSE,DEPT OF THE ARMY,W4GG HQ US ARMY TACOM,2020-05-27,Special Notice,Special Notice,autocustom,2022-12-31,,,2020-06-02T16:00:00-04:00,336212.0,2355,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'karl.j.boedecker.civ@mail.mil', 'phone': '', 'title': None, 'fullName': 'Karl Boedecker'}, {'fax': '', 'type': 'secondary', 'email': 'christopher.a.valevich.civ@mail.mil', 'phone': '5862396029', 'title': None, 'fullName': 'Christopher A. Valevich'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=00fd71c7ddb64f208923a690cedae2d5,OFFICE,"{'zipcode': '48397-5000', 'city': 'WARREN', 'countryCode': 'USA', 'state': 'MI'}",,,https://beta.sam.gov/opp/00fd71c7ddb64f208923a690cedae2d5/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=00fd71c7ddb64f208923a690cedae2d5&limit=1'}]","[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/abb9850ed627454b87aa7746f0da9697/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/f542ee7355f448479f365b17ca1eeb94/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/4316aeefb80a4ffab4ed500aebefc398/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/668a62c5afee4ab0a28d0e7cc1f1b174/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/76d6082671dd41d184b8d1a41c22e6ff/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/ee4088a5439d402a866b75d037aedfee/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/ca52714bab624f518dc1bea455b48df7/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/e3d0b12d99fd42b58a5d86ecad6093ac/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/3478e1d944ac4a8caf32cf393b60f172/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/a0b01b7c48764bdab63bab7868318c55/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/9e28b5c9919744158cc3bac4dec56309/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/4fdc47c316114d3cb007ff2d46c947ab/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/5341429e1e6349dbb827a800ffbeecb1/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/40754176e4eb4b118d9b96aeadb167e7/download?api_key=null&token=]"


In [12]:
data.duplicated('noticeId').sum()

0

In [15]:
# data.drop_duplicates('noticeId', inplace=True)

In [13]:
data.to_csv('./data/5_29_pull.csv', index=False)

In [14]:
df1 = pd.read_csv('./data/combined.csv')
df2 = pd.read_csv('./data/5_29_pull.csv')

In [17]:
df2.shape

(10000, 27)

In [15]:
df = pd.concat([df1, df2])

In [16]:
df.shape

(43454, 27)

In [17]:
df.to_csv('./data/combined.csv', index=False)