<h1 align='center'> Python_Brewing_API</h1>

- Homebrew is a missing package manager for macOS(or Linux), which provides information of how many times each application has been downloded and store the info in json API. 

- To get the information, we need to send a request to two urls: one is for getting package_ame `(https://formulae.brew.sh/api/formula.json)` and description about the app; the other one is to get the analytic data for each app `(https://formulae.brew.sh/api/formula/{package_name}a2ps.json)`.

- To avoid giving burden to the serve, we need to limit the amount of data extraction. 

**tips**

- Populate the extracted data into dictionary using key/value pair and then into list
- Convert the data to json.file using json.dump(data, f, indent=2) for better readability
- Read the json file using data=json.load(f) then sort the data using data.sort(key=func)
- Bring in the json data into pandas and convert the column, 'analytic' to list separately then create another dataframe
    then concatenate the new dataframe with the previous dataframe and do some analysis

In [1]:
# %load ../command1.py

import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity='all'

pd.options.display.float_format='{:,.2f}'.format
pd.set_option('display.max_colwidth', None)


In [2]:
import requests
import json
import time

In [3]:
r=requests.get('https://formulae.brew.sh/api/formula.json')
packages_json=r.json() 

# convert the string to json and extract the first three list of packages to see the structure 
packages_str=json.dumps(packages_json[0:3], indent=4) 
print(packages_str)


# Extract only name of application and the description about that

[
    {
        "name": "a2ps",
        "full_name": "a2ps",
        "tap": "homebrew/core",
        "oldname": null,
        "aliases": [],
        "versioned_formulae": [],
        "desc": "Any-to-PostScript filter",
        "license": "GPL-3.0-or-later",
        "homepage": "https://www.gnu.org/software/a2ps/",
        "versions": {
            "stable": "4.14",
            "head": null,
            "bottle": true
        },
        "urls": {
            "stable": {
                "url": "https://ftp.gnu.org/gnu/a2ps/a2ps-4.14.tar.gz",
                "tag": null,
                "revision": null
            }
        },
        "revision": 0,
        "version_scheme": 0,
        "bottle": {
            "stable": {
                "rebuild": 4,
                "root_url": "https://ghcr.io/v2/homebrew/core",
                "files": {
                    "arm64_monterey": {
                        "cellar": "/opt/homebrew/Cellar",
                        "url": "https://ghcr.io/v2/h

In [4]:
for package in packages_json:
    package_name=package['name']
    package_desc=package['desc']
    package_url=f'https://formulae.brew.sh/api/formula/{package_name}.json'
    r=requests.get(package_url)
    package_json=r.json() 
    json_str=json.dumps(package_json, indent=4)
    print(json_str)
    break

{
    "name": "a2ps",
    "full_name": "a2ps",
    "tap": "homebrew/core",
    "oldname": null,
    "aliases": [],
    "versioned_formulae": [],
    "desc": "Any-to-PostScript filter",
    "license": "GPL-3.0-or-later",
    "homepage": "https://www.gnu.org/software/a2ps/",
    "versions": {
        "stable": "4.14",
        "head": null,
        "bottle": true
    },
    "urls": {
        "stable": {
            "url": "https://ftp.gnu.org/gnu/a2ps/a2ps-4.14.tar.gz",
            "tag": null,
            "revision": null
        }
    },
    "revision": 0,
    "version_scheme": 0,
    "bottle": {
        "stable": {
            "rebuild": 4,
            "root_url": "https://ghcr.io/v2/homebrew/core",
            "files": {
                "arm64_monterey": {
                    "cellar": "/opt/homebrew/Cellar",
                    "url": "https://ghcr.io/v2/homebrew/core/a2ps/blobs/sha256:b92375f7cc49a7440b431d2248cad0d97c96fcca127dace6efdeb0b2f3faa08c",
                    "sha256": "b

In [5]:
results=[]
for package in packages_json:
    package_name=package['name']
    package_desc=package['desc']
    package_url=f'https://formulae.brew.sh/api/formula/{package_name}.json'
    r=requests.get(package_url)
    package_json=r.json() # json_str
    
    install_30d=package_json['analytics']['install_on_request']['30d'][package_name]
    install_90d=package_json['analytics']['install_on_request']['90d'][package_name]
    install_365d=package_json['analytics']['install_on_request']['365d'][package_name]
    
    data={'name': package_name,
         'desc':package_desc,
         'analytic':{'30d':install_30d,
                    '90d':install_90d,
                    '365d':install_365d}}
    results.append(data)
    break

print(results)

[{'name': 'a2ps', 'desc': 'Any-to-PostScript filter', 'analytic': {'30d': 110, '90d': 320, '365d': 1304}}]


In [6]:
results=[]

t1=time.perf_counter()

# retreiving the whole info takes more than 10 min and give some burden server os we just retieve only the first ten items
for package in packages_json[:10]:  
    package_name=package['name']
    package_desc=package['desc']
    package_url=f'https://formulae.brew.sh/api/formula/{package_name}.json'
    r=requests.get(package_url)
    package_json=r.json() # json_str
    
  

    install_30d=package_json['analytics']['install_on_request']['30d'][package_name]
    install_90d=package_json['analytics']['install_on_request']['90d'][package_name]
    install_365d=package_json['analytics']['install_on_request']['365d'][package_name]

    data={'name': package_name,
         'desc':package_desc,
         'analytic':{'30d':install_30d,
                    '90d':install_90d,
                    '365d':install_365d}}
    results.append(data)
    
    time.sleep(r.elapsed.total_seconds())
    print(f'Got {package_name} in {r.elapsed.total_seconds()} seconds')
    
    
    
t2=time.perf_counter()
print(f'Finished in {t2-t1} seconds')

with open('package_info.json', 'w') as f:
    json.dump(results, f, indent=2)
    

Got a2ps in 0.094825 seconds
Got a52dec in 0.407229 seconds
Got aacgain in 0.208594 seconds
Got aalib in 0.220442 seconds
Got aamath in 0.209638 seconds
Got aarch64-elf-binutils in 0.211691 seconds
Got aarch64-elf-gcc in 0.20662 seconds
Got aardvark_shell_utils in 0.224229 seconds
Got abcde in 0.204381 seconds
Got abcl in 0.207387 seconds
Finished in 4.5924874 seconds


In [7]:
with open('package_info.json', 'r') as f:
    data=json.load(f)

print(json.dumps(data[:3], indent=2))
print('\n\n')

data.sort(key=lambda package:package['analytic']['30d'], reverse=True)

print(json.dumps(data[:3], indent=2))

[
  {
    "name": "a2ps",
    "desc": "Any-to-PostScript filter",
    "analytic": {
      "30d": 110,
      "90d": 320,
      "365d": 1304
    }
  },
  {
    "name": "a52dec",
    "desc": "Library for decoding ATSC A/52 streams (AKA 'AC-3')",
    "analytic": {
      "30d": 26,
      "90d": 82,
      "365d": 348
    }
  },
  {
    "name": "aacgain",
    "desc": "AAC-supporting version of mp3gain",
    "analytic": {
      "30d": 44,
      "90d": 169,
      "365d": 683
    }
  }
]



[
  {
    "name": "a2ps",
    "desc": "Any-to-PostScript filter",
    "analytic": {
      "30d": 110,
      "90d": 320,
      "365d": 1304
    }
  },
  {
    "name": "aalib",
    "desc": "Portable ASCII art graphics library",
    "analytic": {
      "30d": 108,
      "90d": 261,
      "365d": 1290
    }
  },
  {
    "name": "aarch64-elf-gcc",
    "desc": "GNU compiler collection for aarch64-elf",
    "analytic": {
      "30d": 84,
      "90d": 286,
      "365d": 397
    }
  }
]


In [8]:
data1=[item for item in data if 'graphics' in item['desc']]
print(json.dumps(data1, indent=2))

[
  {
    "name": "aalib",
    "desc": "Portable ASCII art graphics library",
    "analytic": {
      "30d": 108,
      "90d": 261,
      "365d": 1290
    }
  }
]


In [9]:
import pandas as pd

In [10]:
df=pd.read_json('package_info.json')
df

Unnamed: 0,name,desc,analytic
0,a2ps,Any-to-PostScript filter,"{'30d': 110, '90d': 320, '365d': 1304}"
1,a52dec,Library for decoding ATSC A/52 streams (AKA 'AC-3'),"{'30d': 26, '90d': 82, '365d': 348}"
2,aacgain,AAC-supporting version of mp3gain,"{'30d': 44, '90d': 169, '365d': 683}"
3,aalib,Portable ASCII art graphics library,"{'30d': 108, '90d': 261, '365d': 1290}"
4,aamath,Renders mathematical expressions as ASCII art,"{'30d': 10, '90d': 30, '365d': 266}"
5,aarch64-elf-binutils,GNU Binutils for aarch64-elf cross development,"{'30d': 33, '90d': 102, '365d': 175}"
6,aarch64-elf-gcc,GNU compiler collection for aarch64-elf,"{'30d': 84, '90d': 286, '365d': 397}"
7,aardvark_shell_utils,Utilities to aid shell scripts or command-line users,"{'30d': 11, '90d': 47, '365d': 185}"
8,abcde,Better CD Encoder,"{'30d': 30, '90d': 127, '365d': 637}"
9,abcl,Armed Bear Common Lisp: a full implementation of Common Lisp,"{'30d': 21, '90d': 131, '365d': 317}"


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   name      10 non-null     object
 1   desc      10 non-null     object
 2   analytic  10 non-null     object
dtypes: object(3)
memory usage: 368.0+ bytes


In [12]:
print(type(df.loc[0, 'analytic']))

<class 'dict'>


In [13]:
anal_info=df['analytic']

anal_info

anal_info_list=anal_info.to_list()

anal_info_list

0    {'30d': 110, '90d': 320, '365d': 1304}
1       {'30d': 26, '90d': 82, '365d': 348}
2      {'30d': 44, '90d': 169, '365d': 683}
3    {'30d': 108, '90d': 261, '365d': 1290}
4       {'30d': 10, '90d': 30, '365d': 266}
5      {'30d': 33, '90d': 102, '365d': 175}
6      {'30d': 84, '90d': 286, '365d': 397}
7       {'30d': 11, '90d': 47, '365d': 185}
8      {'30d': 30, '90d': 127, '365d': 637}
9      {'30d': 21, '90d': 131, '365d': 317}
Name: analytic, dtype: object

[{'30d': 110, '90d': 320, '365d': 1304},
 {'30d': 26, '90d': 82, '365d': 348},
 {'30d': 44, '90d': 169, '365d': 683},
 {'30d': 108, '90d': 261, '365d': 1290},
 {'30d': 10, '90d': 30, '365d': 266},
 {'30d': 33, '90d': 102, '365d': 175},
 {'30d': 84, '90d': 286, '365d': 397},
 {'30d': 11, '90d': 47, '365d': 185},
 {'30d': 30, '90d': 127, '365d': 637},
 {'30d': 21, '90d': 131, '365d': 317}]

In [14]:
anal_info_df=pd.DataFrame(anal_info_list)

anal_info_df

Unnamed: 0,30d,90d,365d
0,110,320,1304
1,26,82,348
2,44,169,683
3,108,261,1290
4,10,30,266
5,33,102,175
6,84,286,397
7,11,47,185
8,30,127,637
9,21,131,317


In [15]:
combined_df=pd.concat([df, anal_info_df], ignore_index=True, axis='columns')

combined_df

Unnamed: 0,0,1,2,3,4,5
0,a2ps,Any-to-PostScript filter,"{'30d': 110, '90d': 320, '365d': 1304}",110,320,1304
1,a52dec,Library for decoding ATSC A/52 streams (AKA 'AC-3'),"{'30d': 26, '90d': 82, '365d': 348}",26,82,348
2,aacgain,AAC-supporting version of mp3gain,"{'30d': 44, '90d': 169, '365d': 683}",44,169,683
3,aalib,Portable ASCII art graphics library,"{'30d': 108, '90d': 261, '365d': 1290}",108,261,1290
4,aamath,Renders mathematical expressions as ASCII art,"{'30d': 10, '90d': 30, '365d': 266}",10,30,266
5,aarch64-elf-binutils,GNU Binutils for aarch64-elf cross development,"{'30d': 33, '90d': 102, '365d': 175}",33,102,175
6,aarch64-elf-gcc,GNU compiler collection for aarch64-elf,"{'30d': 84, '90d': 286, '365d': 397}",84,286,397
7,aardvark_shell_utils,Utilities to aid shell scripts or command-line users,"{'30d': 11, '90d': 47, '365d': 185}",11,47,185
8,abcde,Better CD Encoder,"{'30d': 30, '90d': 127, '365d': 637}",30,127,637
9,abcl,Armed Bear Common Lisp: a full implementation of Common Lisp,"{'30d': 21, '90d': 131, '365d': 317}",21,131,317


In [16]:
combined_df.columns

combined_df.drop(2, axis='columns', inplace=True)

combined_df

RangeIndex(start=0, stop=6, step=1)

Unnamed: 0,0,1,3,4,5
0,a2ps,Any-to-PostScript filter,110,320,1304
1,a52dec,Library for decoding ATSC A/52 streams (AKA 'AC-3'),26,82,348
2,aacgain,AAC-supporting version of mp3gain,44,169,683
3,aalib,Portable ASCII art graphics library,108,261,1290
4,aamath,Renders mathematical expressions as ASCII art,10,30,266
5,aarch64-elf-binutils,GNU Binutils for aarch64-elf cross development,33,102,175
6,aarch64-elf-gcc,GNU compiler collection for aarch64-elf,84,286,397
7,aardvark_shell_utils,Utilities to aid shell scripts or command-line users,11,47,185
8,abcde,Better CD Encoder,30,127,637
9,abcl,Armed Bear Common Lisp: a full implementation of Common Lisp,21,131,317


In [17]:
combined_df.rename(columns={0:'Name', 1:'Desc', 3:'30d', 4:'90d', 5:'365d'}, inplace=True)


combined_df

Unnamed: 0,Name,Desc,30d,90d,365d
0,a2ps,Any-to-PostScript filter,110,320,1304
1,a52dec,Library for decoding ATSC A/52 streams (AKA 'AC-3'),26,82,348
2,aacgain,AAC-supporting version of mp3gain,44,169,683
3,aalib,Portable ASCII art graphics library,108,261,1290
4,aamath,Renders mathematical expressions as ASCII art,10,30,266
5,aarch64-elf-binutils,GNU Binutils for aarch64-elf cross development,33,102,175
6,aarch64-elf-gcc,GNU compiler collection for aarch64-elf,84,286,397
7,aardvark_shell_utils,Utilities to aid shell scripts or command-line users,11,47,185
8,abcde,Better CD Encoder,30,127,637
9,abcl,Armed Bear Common Lisp: a full implementation of Common Lisp,21,131,317


In [18]:
df_final=combined_df

df_final

Unnamed: 0,Name,Desc,30d,90d,365d
0,a2ps,Any-to-PostScript filter,110,320,1304
1,a52dec,Library for decoding ATSC A/52 streams (AKA 'AC-3'),26,82,348
2,aacgain,AAC-supporting version of mp3gain,44,169,683
3,aalib,Portable ASCII art graphics library,108,261,1290
4,aamath,Renders mathematical expressions as ASCII art,10,30,266
5,aarch64-elf-binutils,GNU Binutils for aarch64-elf cross development,33,102,175
6,aarch64-elf-gcc,GNU compiler collection for aarch64-elf,84,286,397
7,aardvark_shell_utils,Utilities to aid shell scripts or command-line users,11,47,185
8,abcde,Better CD Encoder,30,127,637
9,abcl,Armed Bear Common Lisp: a full implementation of Common Lisp,21,131,317


In [19]:
df_final.sort_values(by='30d', ascending=False)

Unnamed: 0,Name,Desc,30d,90d,365d
0,a2ps,Any-to-PostScript filter,110,320,1304
3,aalib,Portable ASCII art graphics library,108,261,1290
6,aarch64-elf-gcc,GNU compiler collection for aarch64-elf,84,286,397
2,aacgain,AAC-supporting version of mp3gain,44,169,683
5,aarch64-elf-binutils,GNU Binutils for aarch64-elf cross development,33,102,175
8,abcde,Better CD Encoder,30,127,637
1,a52dec,Library for decoding ATSC A/52 streams (AKA 'AC-3'),26,82,348
9,abcl,Armed Bear Common Lisp: a full implementation of Common Lisp,21,131,317
7,aardvark_shell_utils,Utilities to aid shell scripts or command-line users,11,47,185
4,aamath,Renders mathematical expressions as ASCII art,10,30,266


In [20]:
filt=df_final['Desc'].str.contains('mp3')

df_final.loc[filt]

Unnamed: 0,Name,Desc,30d,90d,365d
2,aacgain,AAC-supporting version of mp3gain,44,169,683
