<a href="https://colab.research.google.com/github/MinKimIP/IPA-public/blob/master/data_request/2020-01-09%20trade%20mark%20numbers%20check.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Request

9 January 2020

Could I just pass on a query relating to the Trademark dataset on IPGOD?
It looks like the 2018 counts are improbably low.
Is this due to indexing latency
Or are we looking in the wrong place?

---

This data request will be answered using [IPGOD 2019](https://data.gov.au/data/dataset/intellectual-property-government-open-data-2019).

We note that IPGOD 2019 has many data quality issues, this notebook simply explores what is in the data.

In [0]:
%%capture
# run this optional installation code if you do not have the packages
!pip install pandas --upgrade
!pip install pandas-profiling[notebook,html] --upgrade

## Scripts

Run the cell below without any changes.

In [0]:
import pandas as pd
from pandas_profiling import ProfileReport

# data sources

def ip_data(ip_type, table):
    url_base = 'https://data.gov.au/data/dataset/a4210de2-9cbb-4d43-848d-46138fefd271/resource/'
    url = {'patent': {'process': '8fa6db74-a461-47f1-acc6-2e0cf7f06bd5/download/ipgod107.csv',
                      'applicant': '846990df-db42-4ad7-bbd6-567fd37a2797/download/ipgod102.csv',
                      'classification': '5aeec421-dddc-4c22-a66a-bfc5ad22947f/download/ipgod104.csv'},
           'trademark': {'process': '4dec358e-14ff-45ef-8b3e-b27274347e23/download/ipgod203.csv',
                         'applicant': 'aae1c14d-f8c0-4540-b5d3-1ed21500271e/download/ipgod202.csv',
                         'classification': 'fb505762-ab2a-4f56-999d-9bedd1da2ad5/download/ipgod204.csv'},
           'design': {'process': '9003a068-82fd-410d-a193-d54b8bc1f171/download/ipgod303.csv',
                      'applicant': '4b802e80-c667-4b84-8f50-72c2624c59c1/download/ipgod302.csv',
                      'classification': 'b01f7e00-a718-4e2d-9ffb-14938fd7dba9/download/ipgod304.csv'}}
    
    df = pd.read_csv(url_base+url[ip_type][table], low_memory=False)
    df = parse_dates(df)

    return df


main_key = {'patent': 'australian_appl_no',
            'trademark': 'tm_number',
            'design': 'application_id'}


# pipe components

def parse_dates(df):
    for column in df.columns:
        if "date" in column:
            df[column] = pd.to_datetime(df[column])
    
    return df


# pipeline

def profile_data(ip_type, table):
    profile = (ip_data(ip_type, table)
                  .pipe(ProfileReport, title=f'{ip_type}-{table}-profile', minimal=True))
    return profile

## Execute

Run the code for the table you wish to explore.

In [0]:
ip_type = 'trademark'
table = 'process'
output_file = 'trademark-process-profile.html'

profile = profile_data(ip_type, table)
profile.to_file(output_file)