### Capstone Idea

[GitHub](https://github.com/biborsz/Capstone)

**Problem Statement:** 

Fedbizopps.gov used to be a website where small businesses could search for federal contract opportunities. While the collection of soliciations and award notifications was searchable based on key terms, finding opportunities of interest based on similarity, rather than key term search, as far as I know, was not available. As a result, looking for contract opportunities took a considerable time, which could potentially put a strain on lightly staffed small businesses. The successor of Fedbizopps.gov, Beta.SAM.gov, although not yet fully operational, provides a wide array of filtering options besides the search by key terms and the browsing function. Those new to the system would benefit, however, from an application that recommends opportunities of interest based on topic and/or wording similarity. 

The purpose of this project is two-fold:
1. Stretching the limits of natural language processing, build an application that, based on user up- or downvote would recommend contract notifications of interest.
2. Observe how well a content based recommender system is capable of finding not only similar but also relevant notifications. 


-> it would give a list of potentially useful search terms, with an option to click on a select few

-> based on the search terms selected - it will further tune the recommender system

-> businesses have an option of rating a recommendation up or down

- based on earlier search terms - one would have access to a longer list of opportunites - that would make browsing possible (right now - it looks to me - that is out of the question)

- companies could search for other relevant information - for example: which companies are applying for similar grants in their business and geographic area
   
   
[API documentation](https://open.gsa.gov/api/get-opportunities-public-api/#user-account-api-key-creation)

[Beta.SAM.gov](https://beta.sam.gov/)

**Methodology:**
   - content based recommender system
   - text vectorizer: *bag-of-words*, *one-hot-encoder* 
     - bag of words: extracts words from the corpus as features
     - one-hot-encoder: gives a value of one if it finds a feature within a row/ text (0 to all other features)
   - classifies/ recommends based on *cosine similarities*

**Sources:**

https://towardsdatascience.com/how-to-build-a-simple-recommender-system-in-python-375093c3fb7d

https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/ch04.html

http://recommender-systems.org/content-based-filtering/

https://heartbeat.fritz.ai/recommender-systems-with-python-part-i-content-based-filtering-5df4940bd831

**Ideas for the future:**
 - find database of registered businesses (business activity/ line of business)
 - crossreference awardees in prior grants to provide a list of competitors

In [1]:
# imports
import pandas as pd
import numpy as np
import requests
import time
import datetime

In [2]:
# set display options 
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', None)

In [None]:
# rewrite code - to read in combined file - 
# separete out dates 
# split string '-'
# turn them into datetime object
# check for oldest date
# turn back into string
# make that postedTo date
# have for loop pull another page
# transform json object
# concat it to existing dataframe
# save it to csv
# I am at 03/02/2020

In [3]:
# define function to return minimum postedDate as string
def get_min_post_date(filename):
    # read in file of downloaded contract opportunities
    df = pd.read_csv(filename)
    # get min posted date
    min_date = pd.to_datetime(df['postedDate']).min()
    return min_date.strftime('%m/%d/%Y')

In [4]:
# define function to return maximum postedDate as string
def get_max_post_date(filename):
    # read in file of downloaded contract opportunities
    df = pd.read_csv(filename)
    # get min posted date
    min_date = pd.to_datetime(df['postedDate']).max()
    return min_date.strftime('%m/%d/%Y')

In [5]:
# define function to return today's date as string
def today():
    now = datetime.datetime.now()
    return now.strftime('%m/%d/%Y')

In [6]:
today()

'06/02/2020'

In [4]:
get_max_post_date('./data/combined.csv')

'05/14/2020'

In [7]:
# pull contract information from api.sam.gov

# set base url
url = 'https://api.sam.gov/prod/opportunities/v1/search'

# create empty list to store results
result = []
# initialize counter
count = 0
# downloaded contract opportunities - file name
# file = './data/combined.csv'

# set postedTo date to today's date by calling today() function
postedTo = today()

# set posted from date
postedFrom = get_max_post_date('./data/combined.csv')

# for loop to pull contracts
for i in range(10):
    count += 1
    
    # do a get request
    req = requests.get(url,
                      params={
                          'api_key': '',
                          'postedFrom': postedFrom,
                          'postedTo': postedTo,
                          'limit': 1000,
                          'offset': (count-1) * 1000
                      })
    
    # add response to result list
    result.append(req)
    
    now = datetime.datetime.now()
    print('Time:', now.strftime("%Y-%m-%d %H:%M:%S"))
    time.sleep(5)
    
    
    
# source for datetime - https://www.w3resource.com/python-exercises/python-basic-exercise-3.php

Time: 2020-06-02 21:38:44
Time: 2020-06-02 21:39:11
Time: 2020-06-02 21:39:35
Time: 2020-06-02 21:40:01
Time: 2020-06-02 21:40:15
Time: 2020-06-02 21:40:21
Time: 2020-06-02 21:40:30
Time: 2020-06-02 21:40:36
Time: 2020-06-02 21:40:44
Time: 2020-06-02 21:40:50


In [8]:
# unpack list of json objects from response data
ops = []
for item in result:
    print(item.headers)
    ops.append(item.json())

{'Age': '20', 'Content-Type': 'application/hal+json', 'Date': 'Wed, 03 Jun 2020 01:38:44 GMT', 'Server': 'openresty', 'Vary': 'Origin, Access-Control-Request-Method, Access-Control-Request-Headers', 'Via': 'http/1.1 api-umbrella (ApacheTrafficServer [cMsSf ])', 'X-Cache': 'MISS', 'X-Forwarded-For': '74.96.156.35, 10.177.16.72, 10.177.54.200, 10.177.54.200', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Set-Cookie': 'citrix_ns_id=f+6nyi/93TsbB4ymY/y/Sxn+npQ0000; Domain=.sam.gov; Path=/; Secure; HttpOnly', 'Cache-Control': 'private', 'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked'}
{'Age': '20', 'Content-Type': 'application/hal+json', 'Date': 'Wed, 03 Jun 2020 01:39:10 GMT', 'Server': 'openresty', 'Vary': 'Origin, Access-Control-Request-Method, Access-Control-Request-Headers', 'Via': 'http/1.1 api-umbrella (ApacheTrafficServer [cMsSf ])', 'X-Cache': 'MISS', 'X-Forwarded-For': '74.96.156.35, 10.177.16.72, 10.177.54.200

In [9]:
# parse json objects
ls_data = []
for i in range(len(ops)):
    print(ops[i].keys())
    df = pd.DataFrame(ops[i]['opportunitiesData'])
    ls_data.append(df)
    data = pd.concat(ls_data)

dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])


In [10]:
data.shape

(4287, 27)

In [11]:
data.tail(10)

Unnamed: 0,noticeId,title,solicitationNumber,department,subTier,office,postedDate,type,baseType,archiveType,archiveDate,typeOfSetAsideDescription,typeOfSetAside,responseDeadLine,naicsCode,classificationCode,active,award,pointOfContact,description,organizationType,officeAddress,placeOfPerformance,additionalInfoLink,uiLink,links,resourceLinks
277,0354455d737643bc98f6a553922d64a3,Triton UAS MOB ECP Development,CJA-15_24847_N00019-15-G-0026_N00019-20-F-0865,DEPT OF DEFENSE,DEPT OF THE NAVY,NAVAL AIR SYSTEMS COMMAND,2020-05-29,Justification,Justification,auto30,2020-06-29,,,,,AC15,Yes,"{'date': '2020-05-15', 'number': 'N00019-15-G-0026'}","[{'fax': '', 'type': 'primary', 'email': 'jessica.l.guy@navy.mil', 'phone': '', 'title': None, 'fullName': 'Jessica L. Guy'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=0354455d737643bc98f6a553922d64a3,OFFICE,"{'zipcode': '20670-5000', 'city': 'PATUXENT RIVER', 'countryCode': 'USA', 'state': 'MD'}",,,https://beta.sam.gov/opp/0354455d737643bc98f6a553922d64a3/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=0354455d737643bc98f6a553922d64a3&limit=1'}]",[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/6c67a84e117d4f038bdbd16e5101ea98/download?api_key=null&token=]
278,034d00eff98c4da6b553072cf0ba3d37,"53--BOLT,MACHINE",SPE5E820U0029,DEPT OF DEFENSE,DEFENSE LOGISTICS AGENCY (DLA),DLA TROOP SUPPORT,2020-05-29,Award Notice,Award Notice,auto15,2020-06-13,,,,332722.0,53,Yes,"{'date': '2020-05-29', 'number': 'SPE5E420D60CC', 'amount': '250000.00', 'awardee': {'name': 'Alamo Aircraft, Ltd. 2538 SW 36th St San Antonio 78237-4032 US', 'location': {}, 'duns': '027005180', 'cageCode': '55345'}}","[{'fax': None, 'type': 'primary', 'email': 'Lisa.Diplacido@dla.mil', 'phone': None, 'title': None, 'fullName': 'LISA DIPLACIDODSN444-0831'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=034d00eff98c4da6b553072cf0ba3d37,OFFICE,"{'zipcode': '19111', 'city': 'PHILADELPHIA', 'countryCode': 'USA', 'state': 'PA'}",{},,https://beta.sam.gov/opp/034d00eff98c4da6b553072cf0ba3d37/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=034d00eff98c4da6b553072cf0ba3d37&limit=1'}]",
279,0334dc835194483fa1888f4eba35ab9f,Replace Roofing Hospital & Education,15B41720B00000001,"JUSTICE, DEPARTMENT OF",FEDERAL PRISON SYSTEM / BUREAU OF PRISONS,FAO,2020-05-29,Presolicitation,Presolicitation,autocustom,2020-12-30,Total Small Business Set-Aside (FAR 19.5),SBA,2020-07-22T14:00:00-05:00,238160.0,Z2FF,Yes,,"[{'fax': '9723524545', 'type': 'primary', 'email': 'mrmorgan@bop.gov', 'phone': '9723524504', 'title': None, 'fullName': 'Michelle R Morgan'}, {'fax': '', 'type': 'secondary', 'email': 'jkercheval@bop.gov', 'phone': '9723524520', 'title': None, 'fullName': 'Jeffrey P. Kercheval'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=0334dc835194483fa1888f4eba35ab9f,OFFICE,"{'zipcode': '75051', 'city': 'GRAND PRAIRIE', 'countryCode': 'USA', 'state': 'TX'}","{'city': {'code': '53920', 'name': 'Milan'}, 'state': {'code': 'MI', 'name': 'Michigan'}, 'zip': '48160', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/0334dc835194483fa1888f4eba35ab9f/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=0334dc835194483fa1888f4eba35ab9f&limit=1'}]",
280,03081065fada472d98a28c8d0f89de5b,HVAC Horizontal,F1C0RH0125AW02,DEPT OF DEFENSE,DEPT OF THE AIR FORCE,FA5004 354 CONS PK,2020-05-29,Combined Synopsis/Solicitation,Combined Synopsis/Solicitation,auto15,2020-06-20,Total Small Business Set-Aside (FAR 19.5),SBA,2020-06-05T14:00:00-08:00,33599.0,5975,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'jarred.julian@us.af.mil', 'phone': '9073773326', 'title': None, 'fullName': 'Jarred W. Julian'}, {'fax': '', 'type': 'secondary', 'email': 'brandon.tracer@us.af.mil', 'phone': '9073775309', 'title': None, 'fullName': 'Brandon Tracer'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=03081065fada472d98a28c8d0f89de5b,OFFICE,"{'zipcode': '99702-2200', 'city': 'EIELSON AFB', 'countryCode': 'USA', 'state': 'AK'}","{'city': {'code': 'AK-02', 'name': 'Eielson AFB'}, 'state': {'code': 'AK', 'name': 'Alaska'}, 'zip': '99702', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/03081065fada472d98a28c8d0f89de5b/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=03081065fada472d98a28c8d0f89de5b&limit=1'}]","[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/8b0869ac01a04815a20e1e84ad3364a6/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/2b97d65892994dfb9df8e77f84258992/download?api_key=null&token=]"
281,02b38c2e32f54f73a2ec8c520ea43495,Remanufacture of B-1B Horizontal Stabilizer Shaker Gearbox and Motor Assembly,FA811820R0041,DEPT OF DEFENSE,DEPT OF THE AIR FORCE,FA8118 AFSC PZABB,2020-05-29,Presolicitation,Presolicitation,autocustom,2021-05-29,Total Small Business Set-Aside (FAR 19.5),SBA,,,1680,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'kyle.newcomb@us.af.mil', 'phone': '4057394468', 'title': None, 'fullName': 'Kyle Newcomb'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=02b38c2e32f54f73a2ec8c520ea43495,OFFICE,"{'zipcode': '73145-3303', 'city': 'TINKER AFB', 'countryCode': 'USA', 'state': 'OK'}",,,https://beta.sam.gov/opp/02b38c2e32f54f73a2ec8c520ea43495/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=02b38c2e32f54f73a2ec8c520ea43495&limit=1'}]",[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/a40ac55d5a404a14a85f246adbced731/download?api_key=null&token=]
282,019d96aff2e141f6be46dda1b0084cae,2310--692-20-3-9963-0077 FMS Utility Vehicles Van Box and Utility Bed,36C26020Q0455,"VETERANS AFFAIRS, DEPARTMENT OF","VETERANS AFFAIRS, DEPARTMENT OF",260-NETWORK CONTRACT OFFICE 20 (36C260),2020-05-29,Combined Synopsis/Solicitation,Combined Synopsis/Solicitation,autocustom,2020-07-09,Service-Disabled Veteran-Owned Small Business (SDVOSB) Set-Aside (FAR 19.14),SDVOSBC,2020-06-09T17:00:00-07:00,336112.0,2310,Yes,{'awardee': {}},"[{'fax': '', 'type': 'primary', 'email': 'Rex.Maughan@va.gov', 'phone': '360-816-2769', 'title': 'Rex D. Maughan', 'fullName': ' rex.maughan@va.gov'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=019d96aff2e141f6be46dda1b0084cae,OFFICE,"{'zipcode': '98662', 'city': 'VANCOUVER', 'countryCode': 'USA', 'state': 'WA'}","{'streetAddress': 'VA Southern Oregon Rehabilitation Center (692) 8495 Crater Lake Hwy', 'streetAddress2': 'White City, OR. 97503', 'city': {'name': ''}, 'state': {'name': ''}, 'zip': '97503', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/019d96aff2e141f6be46dda1b0084cae/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=019d96aff2e141f6be46dda1b0084cae&limit=1'}]",[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/841c95be5f9c43b08d592409a4c00a49/download?api_key=null&token=]
283,01198596075342d3ba066f3f1047e599,BASH (Bird Aircraft Strike Hazard) TREE REMOVAL,FA5000-20-Q-0055,DEPT OF DEFENSE,DEPT OF THE AIR FORCE,FA5000 673 CONS LGC,2020-05-29,Combined Synopsis/Solicitation,Combined Synopsis/Solicitation,auto15,2020-06-27,Service-Disabled Veteran-Owned Small Business (SDVOSB) Set-Aside (FAR 19.14),SDVOSBC,2020-06-12T12:00:00-08:00,561730.0,F014,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'robert.briggs.13@us.af.mil', 'phone': '9075525488', 'title': None, 'fullName': 'Robert W. Briggs'}, {'fax': '', 'type': 'secondary', 'email': 'rodrigo.laguna_lagos.3@us.af.mil', 'phone': '9075523450', 'title': None, 'fullName': 'Rodrigo E. Laguna Lagos'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=01198596075342d3ba066f3f1047e599,OFFICE,"{'zipcode': '99506-2501', 'city': 'JBER', 'countryCode': 'USA', 'state': 'AK'}","{'city': {'code': 'AK-06', 'name': 'JBER'}, 'state': {'code': 'AK', 'name': 'Alaska'}, 'zip': '99506', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/01198596075342d3ba066f3f1047e599/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=01198596075342d3ba066f3f1047e599&limit=1'}]","[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/2927e9b1b4044d999a01f6cae2c04c84/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/c9b6f77bffab4958aef74580d583cdc1/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/2aa2b99831bc4e99af5640756fd7c57a/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/35d06a022a3d4122b7502a9a682a833a/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/db6157a78768470d9e64007567cec1b4/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/1c927e0060b5423482d185ebf37aabd0/download?api_key=null&token=]"
284,00d3d2a3ab47486894a35ba3302e8c8c,Facility Sanitation HVAC Cleansing,NMAN7100-20-00736,"COMMERCE, DEPARTMENT OF",NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION,DEPT OF COMMERCE NOAA,2020-05-29,Combined Synopsis/Solicitation,Combined Synopsis/Solicitation,auto15,2020-06-23,Total Small Business Set-Aside (FAR 19.5),SBA,2020-06-08T08:00:00-04:00,561790.0,K044,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'Christopher.Baker@noaa.gov', 'phone': '', 'title': None, 'fullName': 'Christopher Baker'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=00d3d2a3ab47486894a35ba3302e8c8c,OFFICE,"{'zipcode': '64106', 'city': 'KANSAS CITY', 'countryCode': 'USA', 'state': 'MO'}","{'city': {'code': '57000', 'name': 'Norfolk'}, 'state': {'code': 'VA', 'name': 'Virginia'}, 'zip': '23510', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/00d3d2a3ab47486894a35ba3302e8c8c/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=00d3d2a3ab47486894a35ba3302e8c8c&limit=1'}]","[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/25d7381ccad24299810283f3c268858e/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/1d7471c505aa49fcbbb1d71502743e84/download?api_key=null&token=]"
285,00c910a02810491b967a6284b8d54f34,Maine Army National Guard Lodging Blanket Purchase Agreement,W912JD-20-R-0002,DEPT OF DEFENSE,DEPT OF THE ARMY,W7NC USPFO ACTIVITY ME ARNG,2020-05-29,Combined Synopsis/Solicitation,Combined Synopsis/Solicitation,auto15,2020-07-02,Total Small Business Set-Aside (FAR 19.5),SBA,2020-06-17T15:00:00-04:00,721110.0,V231,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'daniel.s.bibeau.mil@mail.mil', 'phone': '2074306219', 'title': None, 'fullName': 'Daniel S Bibeau'}, {'fax': '', 'type': 'secondary', 'email': 'jeffrey.j.phair.mil@mail.mil', 'phone': '2074305610', 'title': None, 'fullName': 'Jeffrey J. Phair'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=00c910a02810491b967a6284b8d54f34,OFFICE,"{'zipcode': '04333-0032', 'city': 'AUGUSTA', 'countryCode': 'USA', 'state': 'ME'}","{'state': {'code': 'ME', 'name': 'Maine'}, 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/00c910a02810491b967a6284b8d54f34/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=00c910a02810491b967a6284b8d54f34&limit=1'}]","[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/4ad6aa330ddd4e4cab792a6bd6ff5f29/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/725ae4e4c1074c0fba327fdaa5900079/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/6c36236c81e14db2a21586ca83392618/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/995577f931b844edb08cba71be3161bb/download?api_key=null&token=]"
286,00a3cc19ad634cb3b59fc72c223cf48a,REVETMENT TANK CLEANING SERVICES,W912EQ20Q0032,DEPT OF DEFENSE,DEPT OF THE ARMY,W07V ENDIST MEMPHIS,2020-05-29,Combined Synopsis/Solicitation,Combined Synopsis/Solicitation,auto15,2020-06-19,Total Small Business Set-Aside (FAR 19.5),SBA,2020-06-04T10:00:00-05:00,562211.0,J020,Yes,,"[{'fax': '9015443710', 'type': 'primary', 'email': 'valerie.marshall@usace.army.mil', 'phone': '9015443619', 'title': None, 'fullName': 'Valerie Marshall'}, {'fax': '', 'type': 'secondary', 'email': 'monica.a.moody@usace.army.mil', 'phone': '9015440838', 'title': None, 'fullName': 'Monica Moody'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=00a3cc19ad634cb3b59fc72c223cf48a,OFFICE,"{'zipcode': '38103-1894', 'city': 'MEMPHIS', 'countryCode': 'USA', 'state': 'TN'}","{'city': {'code': '48000', 'name': 'Memphis'}, 'state': {'code': 'TN', 'name': 'Tennessee'}, 'zip': '38109', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/00a3cc19ad634cb3b59fc72c223cf48a/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=00a3cc19ad634cb3b59fc72c223cf48a&limit=1'}]",[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/022f3ae82e974393ad12ad2f305155df/download?api_key=null&token=]


In [12]:
data.duplicated('noticeId').sum()

0

In [15]:
# data.drop_duplicates('noticeId', inplace=True)

In [13]:
data.to_csv('./data/6_02_pull.csv', index=False)

In [14]:
df1 = pd.read_csv('./data/combined.csv')
df2 = pd.read_csv('./data/6_02_pull.csv')

In [17]:
df2.shape

(10000, 27)

In [15]:
df = pd.concat([df1, df2])

In [16]:
df.shape

(47741, 27)

In [17]:
df.to_csv('./data/combined.csv', index=False)