### Capstone Idea

[GitHub](https://github.com/biborsz/Capstone)

**Problem Statement:** 

Beta.SAM.gov is the successor, among other federal governmental websites, of fedbizops.gov. The part that interests me is that it has a searchable collection of federal contract solicitations awards, as well as an information system of past grants. The current system is not yet fully operational. Nonetheless, it allows to search the collection based on key terms, grant id, etc. One of it's problems in the past was, that while it was searchable, it did not have a recommender system. Businesses, especially small businesses had to spend a considerable amount of time to find relevant grant opportunities. This new system offers that. Creating an account will have the advantage of receiving updates. However, the downside of email updates is that it clogs up the email inbox, and it does not always generate the expected value:
- it is difficult to know whether there are more opportunities out there or just the ones about which a business gets notified
- reader fatigue may cause businesses not to look further for contract opportunities than what had already been sent to them - and thus leave potential fundings on the table
- someone still needs to sift through a large amount of potentially irrelevant federal grant descriptions

All in all: a user controlled recommender system would enhance the effectiveness of grant searching for businesses that do not have lots of resources to allocate toward that activity in the first place. 

How it would work:

- based on archival data - it would search for similarities in ## not possible - archived data does not contain this information
   - business activity of applicants
   - name of funding agency
   
   - earlier search terms
   - successful earlier grant applications

-> it would give a list of potentially useful search terms, with an option to click on a select few

-> based on the search terms selected - it will further tune the recommender system

-> businesses have an option of rating a recommendation up or down

- based on earlier search terms - one would have access to a longer list of opportunites - that would make browsing possible (right now - it looks to me - that is out of the question)

- companies could search for other relevant information - for example: which companies are applying for similar grants in their business and geographic area
   
   
[API documentation](https://open.gsa.gov/api/get-opportunities-public-api/#user-account-api-key-creation)

[Beta.SAM.gov](https://beta.sam.gov/)

**Methodology:**
   - content based recommender system
   - text vectorizer: *bag-of-words*, *one-hot-encoder* 
     - bag of words: extracts words from the corpus as features
     - one-hot-encoder: gives a value of one if it finds a feature within a row/ text (0 to all other features)
   - classifies/ recommends based on *cosine similarities*

**Sources:**

https://towardsdatascience.com/how-to-build-a-simple-recommender-system-in-python-375093c3fb7d

https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/ch04.html

http://recommender-systems.org/content-based-filtering/

https://heartbeat.fritz.ai/recommender-systems-with-python-part-i-content-based-filtering-5df4940bd831

**Ideas for the future:**
 - find database of registered businesses (business activity/ line of business)
 - crossreference awardees in prior grants to provide a list of competitors

In [1]:
# imports
import pandas as pd
import numpy as np
import requests
import time
import datetime

In [2]:
# set display options 
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', None)

In [None]:
# rewrite code - to read in combined file - 
# separete out dates 
# split string '-'
# turn them into datetime object
# check for oldest date
# turn back into string
# make that postedTo date
# have for loop pull another page
# transform json object
# concat it to existing dataframe
# save it to csv
# I am at 03/02/2020

In [2]:
# define function to return minimum postedDate as string
def get_min_post_date(filename):
    # read in file of downloaded contract opportunities
    df = pd.read_csv(filename)
    # get min posted date
    min_date = pd.to_datetime(df['postedDate']).min()
    return min_date.strftime('%m/%d/%Y')

In [3]:
# define function to return maximum postedDate as string
def get_max_post_date(filename):
    # read in file of downloaded contract opportunities
    df = pd.read_csv(filename)
    # get min posted date
    min_date = pd.to_datetime(df['postedDate']).max()
    return min_date.strftime('%m/%d/%Y')

In [5]:
# define function to return today's date as string
def today():
    now = datetime.datetime.now()
    return now.strftime('%m/%d/%Y')

In [6]:
today()

'05/27/2020'

In [4]:
get_max_post_date('./data/combined.csv')

'05/14/2020'

In [7]:
# pull contract information from api.sam.gov

# set base url
url = 'https://api.sam.gov/prod/opportunities/v1/search'

# create empty list to store results
result = []
# initialize counter
count = 0
# downloaded contract opportunities - file name
# file = './data/combined.csv'

# set postedTo date to today's date by calling today() function
postedTo = today()

# set posted from date
postedFrom = get_max_post_date('./data/combined.csv')

# for loop to pull contracts
for i in range(10):
    count += 1
    
    # do a get request
    req = requests.get(url,
                      params={
                          'api_key': '',
                          'postedFrom': postedFrom,
                          'postedTo': postedTo,
                          'limit': 1000,
                          'offset': (count-1) * 1000
                      })
    
    # add response to result list
    result.append(req)
    
    now = datetime.datetime.now()
    print('Time:', now.strftime("%Y-%m-%d %H:%M:%S"))
    time.sleep(5)
    
    
    
# source for datetime - https://www.w3resource.com/python-exercises/python-basic-exercise-3.php

Time: 2020-05-27 21:34:15
Time: 2020-05-27 21:34:47
Time: 2020-05-27 21:35:16
Time: 2020-05-27 21:35:45
Time: 2020-05-27 21:36:13
Time: 2020-05-27 21:36:40
Time: 2020-05-27 21:37:09
Time: 2020-05-27 21:37:34
Time: 2020-05-27 21:38:02
Time: 2020-05-27 21:38:30


In [8]:
# unpack list of json objects from response data
ops = []
for item in result:
    print(item.headers)
    ops.append(item.json())

{'Age': '24', 'Content-Type': 'application/hal+json', 'Date': 'Thu, 28 May 2020 01:34:15 GMT', 'Server': 'openresty', 'Vary': 'Origin, Access-Control-Request-Method, Access-Control-Request-Headers', 'Via': 'http/1.1 api-umbrella (ApacheTrafficServer [cMsSf ])', 'X-Cache': 'MISS', 'X-Forwarded-For': '74.96.156.35, 10.177.16.72, 10.177.55.40, 10.177.55.40', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Set-Cookie': 'citrix_ns_id=iDzZbKES6t1h9dwKbzNzM3krjdw0002; Domain=.sam.gov; Path=/; Secure; HttpOnly', 'Cache-Control': 'private', 'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked'}
{'Age': '27', 'Content-Type': 'application/hal+json', 'Date': 'Thu, 28 May 2020 01:34:47 GMT', 'Server': 'openresty', 'Vary': 'Origin, Access-Control-Request-Method, Access-Control-Request-Headers', 'Via': 'http/1.1 api-umbrella (ApacheTrafficServer [cMsSf ])', 'X-Cache': 'MISS', 'X-Forwarded-For': '74.96.156.35, 10.177.16.72, 10.177.52.13, 1

In [9]:
# parse json objects
ls_data = []
for i in range(len(ops)):
    print(ops[i].keys())
    df = pd.DataFrame(ops[i]['opportunitiesData'])
    ls_data.append(df)
    data = pd.concat(ls_data)

dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])
dict_keys(['totalRecords', 'limit', 'offset', 'opportunitiesData', 'links'])


In [10]:
data.shape

(10000, 27)

In [11]:
data.tail(10)

Unnamed: 0,noticeId,title,solicitationNumber,department,subTier,office,postedDate,type,baseType,archiveType,archiveDate,typeOfSetAsideDescription,typeOfSetAside,responseDeadLine,naicsCode,classificationCode,active,award,pointOfContact,description,organizationType,officeAddress,placeOfPerformance,additionalInfoLink,uiLink,links,resourceLinks
990,a3a6bbb5b62b41e6a1ef068b47e739d0,"Multiple Award Task Order Contract (MATOC) for Construction Services, USACE Seattle District",W912DW20R0009,DEPT OF DEFENSE,DEPT OF THE ARMY,US ARMY ENGINEER DISTRICT SEATTLE,2020-05-15,Presolicitation,Sources Sought,autocustom,2020-12-31,Total Small Business Set-Aside (FAR 19.5),SBA,2020-07-01T14:00:00-07:00,236220,Y1JZ,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'lorraine.q.laurente@usace.army.mil', 'phone': '', 'title': None, 'fullName': 'Lorraine Laurente'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a3a6bbb5b62b41e6a1ef068b47e739d0,OFFICE,"{'zipcode': '98134-2329', 'city': 'SEATTLE', 'countryCode': 'USA', 'state': 'WA'}","{'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/a3a6bbb5b62b41e6a1ef068b47e739d0/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a3a6bbb5b62b41e6a1ef068b47e739d0&limit=1'}]",
991,a340eda1f9474d509f49011d31be4053,"10--SIGHT,REAR",SPE7L320U1314,DEPT OF DEFENSE,DEFENSE LOGISTICS AGENCY (DLA),DLA LAND AND MARITIME,2020-05-15,Award Notice,Award Notice,auto15,2020-05-30,,,,332994,10,Yes,"{'date': '2020-05-15', 'number': 'SPE7L320D61BH', 'amount': '250000.00', 'awardee': {'name': 'Quantico Tactical Incorporated 9796 Aberdeen Rd Aberdeen 28315-7742 US', 'location': {}, 'duns': '080991717', 'cageCode': '80MP4'}}","[{'fax': None, 'type': 'primary', 'email': 'NATALYA.RADYK@DLA.MIL', 'phone': None, 'title': None, 'fullName': 'Natalya Radyk614-692-4884'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a340eda1f9474d509f49011d31be4053,OFFICE,"{'zipcode': '43218-3990', 'city': 'COLUMBUS', 'countryCode': 'USA', 'state': 'OH'}",{},,https://beta.sam.gov/opp/a340eda1f9474d509f49011d31be4053/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a340eda1f9474d509f49011d31be4053&limit=1'}]",
992,a32d59c2b966428ca3a601163e7e1bef,Design Build C-130H/J Fuselage Trainer Facility,W9127S20R6019,DEPT OF DEFENSE,DEPT OF THE ARMY,US ARMY ENGINEER DISTRICT LITTLE RO,2020-05-15,Presolicitation,Presolicitation,autocustom,2020-09-30,,,2020-06-15T13:00:00-05:00,236220,Y1JZ,Yes,,"[{'fax': '', 'type': 'primary', 'email': 'Timothy.A.Pighee@usace.army.mil', 'phone': '5013401270', 'title': None, 'fullName': 'Timothy Pighee'}, {'fax': '', 'type': 'secondary', 'email': 'marcus.d.mitchell@usace.army.mil', 'phone': '5013401253', 'title': None, 'fullName': 'Marcus D. Mitchell'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a32d59c2b966428ca3a601163e7e1bef,OFFICE,"{'zipcode': '72201-3225', 'city': 'LITTLE ROCK', 'countryCode': 'USA', 'state': 'AR'}","{'city': {'code': '34750', 'name': 'Jacksonville'}, 'state': {'code': 'AR', 'name': 'Arkansas'}, 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/a32d59c2b966428ca3a601163e7e1bef/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a32d59c2b966428ca3a601163e7e1bef&limit=1'}]","[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/9015b32142f747efa727fe8dc93cade2/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/e5998fd729784c42b3f70e6e2b98e6f4/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/49a0a68ec1f7430ca7e86ec79d70e582/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/8a69c2f9d69b41b0b9aeea0d6e1b270c/download?api_key=null&token=]"
993,a30eee8b8ba047bba134f5971ee984e1,"WHEEL SUBASSEMBLY,O",SPRPA120QR317,DEPT OF DEFENSE,DEFENSE LOGISTICS AGENCY (DLA),"DLA AVIATION AT PHILADELPHIA, PA",2020-05-15,Solicitation,Solicitation,auto15,2020-06-10,,,2020-05-26,333999,1630,Yes,{'awardee': {'location': {}}},"[{'fax': None, 'type': 'primary', 'email': 'JOSEPH.GAMBINO@DLA.MIL', 'phone': None, 'title': None, 'fullName': 'Telephone: 2157374239'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a30eee8b8ba047bba134f5971ee984e1,OFFICE,"{'zipcode': '19111-5098', 'city': 'PHILADELPHIA', 'countryCode': 'USA', 'state': 'PA'}",{},,https://beta.sam.gov/opp/a30eee8b8ba047bba134f5971ee984e1/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a30eee8b8ba047bba134f5971ee984e1&limit=1'}]",
994,a2b62f1f6481424e8eccc0d41baeac5d,"59--PROTECTOR,ELECTRICA",SPE4A620T952M,DEPT OF DEFENSE,DEFENSE LOGISTICS AGENCY (DLA),DLA AVIATION,2020-05-15,Award Notice,Award Notice,auto15,2020-05-30,,,,335931,59,Yes,"{'date': '2020-05-15', 'number': 'SPE4A620PH797', 'amount': '102847.40', 'awardee': {'name': 'Atlantic Diving Supply, Inc. DBA 621 Lynnhaven Pkwy Ste 160 Virginia Beach 2345', 'location': {}, 'duns': '027079776', 'cageCode': '1CAY9'}}","[{'fax': None, 'type': 'primary', 'email': 'Jean.Switzer@dla.mil', 'phone': None, 'title': None, 'fullName': 'Jean Switzer804-279-5575'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a2b62f1f6481424e8eccc0d41baeac5d,OFFICE,"{'zipcode': '23297', 'city': 'RICHMOND', 'countryCode': 'USA', 'state': 'VA'}",{},,https://beta.sam.gov/opp/a2b62f1f6481424e8eccc0d41baeac5d/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a2b62f1f6481424e8eccc0d41baeac5d&limit=1'}]",
995,a2398da2569f4defb99c390d7e728e10,Emergency Equipment Rental Agreement - Mobile Food Draft Specification,SN-2020-14,"AGRICULTURE, DEPARTMENT OF",FOREST SERVICE,USDA FOREST SERVICE,2020-05-15,Special Notice,Solicitation,autocustom,2021-05-21,Total Small Business Set-Aside (FAR 19.5),SBA,2020-05-21T12:00:00-06:00,722310,F003,Yes,,"[{'fax': '208-387-5384', 'type': 'primary', 'email': 'larry.robillard@usda.gov', 'phone': '208-387-5682 / 208-407-6003 cell', 'title': 'Contracting Officer', 'fullName': 'Larry J. Robillard'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a2398da2569f4defb99c390d7e728e10,OFFICE,"{'zipcode': '83705', 'city': 'BOISE', 'countryCode': 'USA', 'state': 'ID'}","{'city': {'code': '8700', 'name': 'Boise'}, 'state': {'code': 'ID', 'name': 'Idaho'}, 'zip': '83709', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/a2398da2569f4defb99c390d7e728e10/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a2398da2569f4defb99c390d7e728e10&limit=1'}]",[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/67d915c1a20d4b98ba57afa71c69825f/download?api_key=null&token=]
996,a22a8028da374702a29aef147c3646cb,6505--GUAIFENESIN,36C77020Q0470,"VETERANS AFFAIRS, DEPARTMENT OF","VETERANS AFFAIRS, DEPARTMENT OF",NATIONAL CMOP OFFICE (36C770),2020-05-15,Presolicitation,Presolicitation,autocustom,2020-06-18,Total Small Business Set-Aside (FAR 19.5),SBA,2020-05-19,325412,6505,Yes,{'awardee': {'location': {}}},"[{'fax': None, 'type': 'primary', 'email': 'Kevin.Mahoney3@va.gov', 'phone': None, 'title': None, 'fullName': 'Kevin A Mahoney Contract Specialist 913-684-0141'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a22a8028da374702a29aef147c3646cb,OFFICE,"{'zipcode': '66048', 'city': 'LEAVENWORTH', 'countryCode': 'USA', 'state': 'KS'}",{},,https://beta.sam.gov/opp/a22a8028da374702a29aef147c3646cb/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a22a8028da374702a29aef147c3646cb&limit=1'}]",
997,a20ef18783bd455d86d33e266b90844f,"61--CABLE,SPECIAL PURPO",SPE7M120U0401,DEPT OF DEFENSE,DEFENSE LOGISTICS AGENCY (DLA),DLA LAND AND MARITIME,2020-05-15,Award Notice,Award Notice,auto15,2020-05-30,,,,335929,61,Yes,"{'date': '2020-05-15', 'number': 'SPE7M120D60RT', 'amount': '250000.00', 'awardee': {'name': 'F & R SALES, INC. DBA MCCAFFREY 109 GAUNTT ST BURLINGTON 08016-1938 US', 'location': {}, 'duns': '604357087', 'cageCode': '1V630'}}","[{'fax': None, 'type': 'primary', 'email': 'Dale.Loney@dla.mil', 'phone': None, 'title': None, 'fullName': 'Dale Loney614-692-3059'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a20ef18783bd455d86d33e266b90844f,OFFICE,"{'zipcode': '43218-3990', 'city': 'COLUMBUS', 'countryCode': 'USA', 'state': 'OH'}",{},,https://beta.sam.gov/opp/a20ef18783bd455d86d33e266b90844f/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a20ef18783bd455d86d33e266b90844f&limit=1'}]",
998,a1d3f34994ca45baa21bffa41dfbd411,Sheridan Lake Emergency Spillway Repair Phase 2,1267T020Q0020,"AGRICULTURE, DEPARTMENT OF",FOREST SERVICE,Black Hills National Forest,2020-05-15,Solicitation,Solicitation,auto15,2020-06-20,Total Small Business Set-Aside (FAR 19.5),SBA,2020-06-05T16:00:00-06:00,237990,Z1KA,Yes,,"[{'fax': '6056739217', 'type': 'primary', 'email': 'brenda.oster@usda.gov', 'phone': '6056739322', 'title': None, 'fullName': 'Brenda Oster'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a1d3f34994ca45baa21bffa41dfbd411,OFFICE,"{'zipcode': '57730', 'city': 'CUSTER', 'countryCode': 'USA', 'state': 'SD'}","{'city': {'code': '52980', 'name': 'Rapid City'}, 'state': {'code': 'SD', 'name': 'South Dakota'}, 'zip': '57702', 'country': {'code': 'USA', 'name': 'UNITED STATES'}}",,https://beta.sam.gov/opp/a1d3f34994ca45baa21bffa41dfbd411/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a1d3f34994ca45baa21bffa41dfbd411&limit=1'}]","[https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/19a4b0cc7d4140cc81b13346f599f7a6/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/fff2a16aec5049c9957e7484b1bb1b81/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/ab9412a277e445008aed63aa71474c52/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/2bdbeb395a314f71abb55558a3c4c4fd/download?api_key=null&token=, https://beta.sam.gov/api/prod/opps/v3/opportunities/resources/files/d82d4aac842645b6b7ca3b4cc068a498/download?api_key=null&token=]"
999,a1b2ddb544de4a60b74c2183ba1804d2,"48--CAP,VALVE",N0010420QEE61,DEPT OF DEFENSE,DEPT OF THE NAVY,NAVSUP WEAPON SYSTEMS SUPPORT MECH,2020-05-15,Solicitation,Solicitation,auto15,2020-07-02,,,2020-06-17,333999,4820,Yes,{'awardee': {'location': {}}},"[{'fax': None, 'type': 'primary', 'email': 'BRIANNA.L.WALKER@NAVY.MIL', 'phone': None, 'title': None, 'fullName': 'Telephone: 7176051391'}]",https://api.sam.gov/prod/opportunities/v1/noticedesc?noticeid=a1b2ddb544de4a60b74c2183ba1804d2,OFFICE,"{'zipcode': '17050-0788', 'city': 'MECHANICSBURG', 'countryCode': 'USA', 'state': 'PA'}",{},,https://beta.sam.gov/opp/a1b2ddb544de4a60b74c2183ba1804d2/view,"[{'rel': 'self', 'href': 'https://api.sam.gov/prod/opportunities/v1/search?noticeid=a1b2ddb544de4a60b74c2183ba1804d2&limit=1'}]",


In [14]:
data.duplicated('noticeId').sum()

6

In [15]:
# data.drop_duplicates('noticeId', inplace=True)

In [15]:
data.to_csv('./data/5_27_pull.csv', index=False)

In [16]:
df1 = pd.read_csv('./data/combined.csv')
df2 = pd.read_csv('./data/5_27_pull.csv')

In [17]:
df2.shape

(10000, 27)

In [18]:
df = pd.concat([df1, df2])

In [19]:
df.shape

(39928, 27)

In [20]:
df.to_csv('./data/combined.csv', index=False)