<a href="https://colab.research.google.com/github/MinKimIP/IPA-public/blob/master/data_request/2019-12-19%20Geelong.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Request

19 December 2019

Hi IP Australia,

Just wondering what data is available at the local government are level regarding intellectual property?

I have seen some data for Greater Geelong for 2011 but was hoping for something more recent.

Thanks for your help.

Kind regards

---

This data request can be answered using [IPGOD 2019](https://data.gov.au/data/dataset/intellectual-property-government-open-data-2019).

Since other local governments may be interested in obtaining such data, this notebook will take a customisable approach in getting the data.

## Scripts

Run the cell below without any changes.

In [0]:
import pandas as pd
import plotly.express as px

# data sources

def ip_data(ip_type, table):
    url_base = 'https://data.gov.au/data/dataset/a4210de2-9cbb-4d43-848d-46138fefd271/resource/'
    url = {'patent': {'process': '8fa6db74-a461-47f1-acc6-2e0cf7f06bd5/download/ipgod107.csv',
                      'applicant': '846990df-db42-4ad7-bbd6-567fd37a2797/download/ipgod102.csv',
                      'classification': '5aeec421-dddc-4c22-a66a-bfc5ad22947f/download/ipgod104.csv'},
           'trademark': {'process': '4dec358e-14ff-45ef-8b3e-b27274347e23/download/ipgod203.csv',
                         'applicant': 'aae1c14d-f8c0-4540-b5d3-1ed21500271e/download/ipgod202.csv',
                         'classification': 'fb505762-ab2a-4f56-999d-9bedd1da2ad5/download/ipgod204.csv'},
           'design': {'process': '9003a068-82fd-410d-a193-d54b8bc1f171/download/ipgod303.csv',
                      'applicant': '4b802e80-c667-4b84-8f50-72c2624c59c1/download/ipgod302.csv',
                      'classification': 'b01f7e00-a718-4e2d-9ffb-14938fd7dba9/download/ipgod304.csv'}}
    
    df = pd.read_csv(url_base+url[ip_type][table], low_memory=False)
    df = parse_dates(df)

    return df


main_key = {'patent': 'australian_appl_no',
            'trademark': 'tm_number',
            'design': 'application_id'}


# pipe components

def parse_dates(df):
    for column in df.columns:
        if "date" in column:
            df[column] = pd.to_datetime(df[column])
    
    return df


def relevant_applicant_data(ip_type, lga_name):
    is_in_lga = f'lga_name.fillna("").str.lower().str.contains("{lga_name.lower()}")'
    
    relevant_columns = [main_key[ip_type], 'ipa_id', 'name', 'abn', 'lga_name']
    
    df = (ip_data(ip_type, 'applicant')
            .query(is_in_lga, engine='python')
            [relevant_columns])
    
    return df


def relevant_process_data(ip_type):
    relevant_columns = {'patent': ['australian_appl_no',
                                   'patent_type',
                                   'application_date',
                                   'sealing_date'],
                        'trademark': ['tm_number',
                                      'type_of_mark_code',
                                      'lodgement_date',
                                      'registered_date'],
                        'design': ['application_id',
                                   'lodgement_date',
                                   'registration_date']}
    
    df = (ip_data(ip_type, 'process')
            [relevant_columns[ip_type]])
    
    if ip_type == 'design':
        df['type'] = ''
    
    return df


def relevant_classification_data(ip_type):
    df = ip_data(ip_type, 'classification')
    
    query = {'patent': 'ipc_mark_type_code=="First (ie Primary)"',
             'trademark': 'tm_number==tm_number',
             'design': 'primary_class_code_ind'}
    
    relevant_columns = {'patent': ['australian_appl_no', 'ipc_mark_value'],
                        'trademark': ['tm_number', 'class_code'],
                        'design': ['application_id', 'class_code']}
    
    df = (df.query(query[ip_type])
            [relevant_columns[ip_type]])
    
    return df


def rename_columns(df, ip_type):
    column_rename_dict = {'patent': {'australian_appl_no': 'application_number',
                                     'patent_type': 'application_type',
                                     'ipc_mark_value': 'classification',
                                     'sealing_date': 'granted_date',
                                     'name': 'applicant_name'},
                          'trademark': {'tm_number': 'application_number',
                                        'type_of_mark_code': 'application_type',
                                        'class_code': 'classification',
                                        'lodgement_date': 'application_date',
                                        'registered_date': 'granted_date',
                                        'name': 'applicant_name'},
                          'design': {'application_id': 'application_number',
                                     'type': 'application_type',
                                     'class_code': 'classification',
                                     'lodgement_date': 'application_date',
                                     'registration_date': 'granted_date',
                                     'name': 'applicant_name'}}
    
    df = df.rename(columns = column_rename_dict[ip_type])

    return df


def create_type_column(df, ip_type):
    df['ip_type']=ip_type

    return df


def reorder_columns(df):
    ordered_columns = ['ip_type',
                       'application_type',
                       'application_number',
                       'application_date',
                       'granted_date',
                       'classification',
                       'lga_name',
                       'ipa_id',
                       'abn',
                       'applicant_name']
    df = df[ordered_columns]
    return df


# pipeline

def relevant_data(ip_type, lga_name):
    df = (relevant_applicant_data(ip_type, lga_name)
             .merge(relevant_process_data(ip_type), on=main_key[ip_type], how='left')
             .merge(relevant_classification_data(ip_type), on=main_key[ip_type], how='left')
             .drop_duplicates()
             .pipe(rename_columns, ip_type)
             .pipe(create_type_column, ip_type)
             .pipe(reorder_columns))
    
    return df


def get_ip_data_for_lga(lga_name):
    df = pd.concat([relevant_data('patent', lga_name),
                    relevant_data('trademark', lga_name),
                    relevant_data('design', lga_name)])
    
    return df


csv_preference = {'index': False,
                  'encoding': 'utf-8',
                  'date_format': '%Y-%m-%d',
                  'float_format': '%.0f'}


# visualisations

def visualise_counts_over_years(df, lga_name):
    df_counts = (df.assign(application_year=lambda x: x['application_date'].dt.year)
                   [['ip_type', 'application_number', 'application_year']]
                   .groupby(['ip_type', 'application_year']).count().reset_index()
                   .rename(columns={'application_number': 'application_count'})
                   .query('1970 < application_year < 2018'))
    
    viz = (px.line(df_counts,
                   x = 'application_year',
                   y = 'application_count',
                   color = 'ip_type',
                   title = f'Number of IP Right Applications by Applicants from {lga_name.capitalize()}',
                   labels = {'application_year':'Year of Application', 'application_count':'Application Count'},
                   color_discrete_sequence = px.colors.qualitative.Safe))

    return viz.show()

## Get data

Change the search term for the local government area name, and the file name below and run the code.

In [0]:
lga_search_term = 'Geelong'
save_to_file_name = 'ip_data_geelong.csv'

df = get_ip_data_for_lga(lga_name=lga_search_term)

df.to_csv(save_to_file_name, **csv_preference)

In [48]:
df.sample(n=25)

Unnamed: 0,ip_type,application_type,application_number,application_date,granted_date,classification,lga_name,ipa_id,abn,applicant_name
4255,trademark,Trade Mark,1261819,2008-09-11,NaT,3,Greater Geelong (C),64329.0,51005930000.0,Caron Laboratories Pty Ltd
240,design,,199401532,1994-05-17,1994-07-01,25-01G,Greater Geelong (C),884066.0,,non-entity
1520,trademark,Trade Mark,826976,2000-03-09,2000-03-09,6,Greater Geelong (C),145110.0,50007120000.0,Lawvale Pty. Ltd.
511,trademark,Trade Mark,480341,1988-01-28,1988-01-28,25,Greater Geelong (C),222864.0,75004250000.0,Target Australia Pty Ltd
7306,trademark,Trade Mark,1688143,2015-04-17,NaT,35,Greater Geelong (C),476480.0,72603350000.0,YOUPAPARAZZI PTY LTD
1947,trademark,Trade Mark,918184,2002-07-01,2002-07-01,35,Greater Geelong (C),409498.0,86085690000.0,CarMax Autospares Pty Ltd
2008,trademark,Trade Mark,924714,2002-08-28,2002-08-28,36,Greater Geelong (C),71424.0,21699250000.0,Friends of Geelong Botanic Gardens Inc
3397,trademark,Trade Mark,1142553,2006-10-24,2006-10-24,35,Greater Geelong (C),293707.0,61058400000.0,Cotton On Body Pty Ltd
5048,trademark,Trade Mark,1372475,2010-07-14,NaT,27,Greater Geelong (C),83390.0,58000850000.0,Godfrey Hirst Australia Pty Ltd
4474,trademark,,1290892,NaT,NaT,,Greater Geelong (C),363874.0,73121940000.0,APC Tri-Star Developments


In [49]:
visualise_counts_over_years(df, lga_search_term)