## Collaboration Score

## Cities

In the following, we will assign values for each of the added features in an attempt to create a scoring system for the collaborative efforts of cities. 

More specifically, we will be looking at the columns `has_business_collaboration` and `collaboration_area`. 

Starting with `has_business_collaboration`, we will conduct a scoring from 1 to 5 with 1 being the lowest score which is attributed if there is no response given and 5 meaning that there is a business collaboration already in place. The scoring methodology is as follows:

* 0: no response
* 1: No / Not intending to undertake / Do not know
* 2: Intending to undertake in future
* 3: Intending to undertake in the next 2 years
* 4: In progress
* 5: Yes

As for `collaboration_area` there is not enough information provided to make a clear and reliable distinction between the separate topics. However, following the idea of a holistic approach to address a climate change, we perceive that it is favorable if cities cooperate with businesses on multiple areas. Accordingly, we assign the following scores: 

* 0: no response
* 1: one collaboration area
* 2: two collaboration areas
* 3: three collaboration areas
* 4: four to five collaboration areas 
* 5: more that five collaboration areas

Finally, we will aggregate the results in a final `city_collaboration_score` that will guide us in how well a particular city performs.

### c_score_1 : City-Business Collaboration

In [None]:
# assign scores for each response option
scores = {'nan': 0,
          'No' :1, 
          'Not intending to undertake' :1,
          'Do not know' :1,
          'Intending to undertake in future':2,
          'Intending to undertake in the next 2 years' :3,
          'In progress':4,
          'Yes':5}

# create new business collaboration measure by mapping the scores to the respective response
cid_new["has_business_collaboration_score"] = cid_new["has_business_collaboration"].map(scores)

### c_score_2 : City_Business Collaboration Area

In [None]:
# create a select key on which the information is mapped on
cid_new["select_key"] =cid_new["year"].astype(str)+"_"+cid_new["account_number"].astype(str)

# compute the sum of entries of all individual collaboration areas and add each of them as new columns
cid_new = cid_new.join(pd.crosstab(cid_new["select_key"], cid_new["collaboration_area"]), on="select_key")

# this computes the sum of all individual columns
cid_new["sum_area"] = cid_new.iloc[:,19:].sum(axis=1)

# assign the respective score for each number of counts
def create_score(x):
        if x == 1:
            return 1
        elif x == 2:
            return 2
        elif x == 3:
            return 3
        elif x == 4 or x == 5:
            return 4
        elif x > 5:
            return 5
        else:
            return 0

cid_new["collaboration_area_score"] = cid_new["sum_area"].apply(create_score)

#drop all helper columns from the dataframe
cid_new.drop(['Other', 'Energy', 'Water', 'Waste', 'Transport',
       'Industry', 'Agriculture and Forestry',
       'Building and Infrastructure', 'Spatial Planning',
       'Social Services', 'Business and Financial Services', 'sum_area'], axis=1, inplace=True)

### Final Collaboration Scoring Table for Cities

In [None]:
cid_scores = cid_new[["account_number", "year", "has_business_collaboration_score", "collaboration_area_score"]]
# remove all duplicate entries that originate from the multi-row responses
cid_scores.drop_duplicates(inplace=True)
cid_scores.rename(columns={'has_business_collaboration_score': 'c_score_1', 'collaboration_area_score': 'c_score_2'}, inplace=True)cid_scores = cid_new[["account_number", "year", "has_business_collaboration_score", "collaboration_area_score"]]
# remove all duplicate entries that originate from the multi-row responses
cid_scores.drop_duplicates(inplace=True)
cid_scores.rename(columns={'has_business_collaboration_score': 'c_score_1', 'collaboration_area_score': 'c_score_2'}, inplace=True)

In [None]:
cid_scores.to_pickle("data/cid_scores.pkl", protocol=4)

## Corporates

Next, we will assign values for each of the added features in an attempt to create a scoring model for the collaborative efforts of corporates. 

More specifically, we will be looking at the columns `value_chain_engagement`, `customer_engagement`, `supply_chain_engagement`, and `policy_engagement`. 

Starting with `value_chain_engagement`, we will conduct a scoring from 1 to 5 with 1 being the lowest score which is attributed if there is no response given and 5 meaning that the company engages with both suppliers and customers. The scoring methodology is slightly more complicated compared to the previous models and is computed as follows:

* 0: No response
* 1: No, we do not engage
* 2: Yes, our investee companies or Yes, other partners in our value chain while both Yes, our customers and Yes, our suppliers are not included
* 3: Either Yes, our Suppliers or Yes, our Customers
* 4: Both Yes, our Suppliers and Yes, our Customers
* 5: All of Yes, our Supplier, Yes, our Customers and Yes, other partners in our value chain

### c_score_3 : Corporate Value Chain Engagement

In [None]:
# create new dataframe with value chain engagement responses
df_value = cor.query("question_number == 'C12.1'")[["account_number", "year", "entity", "response_answer"]]

# split response answer entries into individual response
df_value = split_response(df=df_value, column="response_answer", sep=";") 
df_value["response_answer"] = df_value["response_answer"].str.lstrip()

# create select key for mapping the response
df_value["select_key"] =df_value["year"].astype(str)+"_"+df_value["account_number"].astype(str)
cod_new["select_key"] =cod_new["year"].astype(str)+"_"+cod_new["account_number"].astype(str)

# compute the sum of entries of all individual collaboration areas and add each of them as new columns
df_value = df_value.join(pd.crosstab(df_value["select_key"], df_value["response_answer"]), on="select_key")

# define a function to convert our response methodology to scores 
def conditions(s):
    if s["No, we do not engage"] >=1: 
        return 1
    elif ((s["Yes, our investee companies"] >=1) or (s["Yes, other partners in the value chain"]>=1)) and ((s["Yes, our customers"] == 0) and (s["Yes, our suppliers"] == 0)): 
        return 2
    elif ((s["Yes, our customers"] >= 1) or (s["Yes, our suppliers"] >= 1)) and ((s["Yes, our investee companies"] == 0) and (s["Yes, other partners in the value chain"]==0)): 
        return 3
    elif ((s["Yes, our customers"] >= 1) and (s["Yes, our suppliers"] >= 1)) and ((s["Yes, our investee companies"] == 0) and (s["Yes, other partners in the value chain"]==0)): 
        return 4
    elif ((s["Yes, our customers"] >= 1) and (s["Yes, our suppliers"] >= 1)) and ((s["Yes, our investee companies"] >= 1) or (s["Yes, other partners in the value chain"]>=1)): 
        return 5
    else: 
        return 0

# apply the function to create value chain scores
df_value['value_chain_score'] = df_value.apply(conditions, axis=1)

# create new dataframe with corporate collaboration scores including the new value chain score
cod_scores = df_value[["account_number", "year", "value_chain_score"]]

# remove duplicate entries for each year
cod_scores.drop_duplicates(inplace=True)

### c_score_4: Corporate Supply Chain Engagement

In [None]:
df_supply = cor.query("question_number == 'C12.1a' and column_number == 1")[["account_number", "year", "entity", "response_answer"]]

# Combine all the `other. please specify` responses into one `Other` category
df_supply = df_supply.replace(df_supply.groupby('response_answer').sum().index[4:], 'Other')

# define scoring system
scores = {'NaN': 0,
          'Compliance & onboarding' :1, 
          'Information collection (understanding supplier behavior)' :2,
          'Engagement & incentivization (changing supplier behavior)':4,
          'Innovation & collaboration (changing markets)' :5,
          'Other':3}

# maps scores to the respective response answer
df_supply["supply_score"] = df_supply["response_answer"].map(scores)

# choose max score for each entitiy in each year
df_supply['supply_chain_score'] = df_supply.groupby(['account_number', 'year'])['supply_score'].transform(np.max)

# create merge keys
df_supply["select_key"] = df_supply["year"].astype(str) + "_" + df_supply["account_number"].astype(str)
cod_scores["select_key"] = cod_scores["year"].astype(str) + "_" + cod_scores["account_number"].astype(str)

# merge new supply chain score to disclosure dataframe
cod_scores = pd.merge(left=cod_scores,
                   right= df_supply["supply_chain_score"],
                   left_on=cod_scores["select_key"],
                   right_on=df_supply["select_key"],
                   how="left")
cod_scores.drop_duplicates(inplace=True)

# drop the unneccessary columns
cod_scores.drop("key_0", axis=1, inplace=True)

### c_score_5: Corporate Customer Engagement

In [None]:
df_customer = cor.query("question_number == 'C12.1b' and column_number == 1")[["account_number", "year", "entity", "response_answer"]]

# replace all of the individual "other" specifications into a single "other" group
df_customer = df_customer.replace(df_customer.groupby('response_answer').sum().index[5:], 'Other')

# we merge the two Education/information sharing response options into one response
df_customer["response_answer"] = df_customer['response_answer'].str.replace('Education/information sharing : Engagement','Education/information sharing')

# we assign values for each response option
scores = {'nan': 0, 
          'Information collection (understanding customer behavior)' :1,
          'Education/information sharing' : 2,
          'Engagement & incentivization (changing customer behavior)':4,
          'Collaboration & innovation' :5,
          'Other':3}

# maps scores to the respective response answer
df_customer["customer_score"] = df_customer["response_answer"].map(scores)

# choose max score for each entitiy in each year
df_customer['customer_score'] = df_customer.groupby(['account_number', 'year'])['customer_score'].transform(np.max)

# create select key for merging to disclosure dataframe
df_customer["select_key"] = df_customer["year"].astype(str)+"_"+df_customer["account_number"].astype(str)

# create select key for merging to collaboration scoring dataframe
df_customer["select_key"] = df_customer["year"].astype(str)+"_"+df_customer["account_number"].astype(str)

# merge new supply chain score to disclosure dataframe
cod_scores = pd.merge(left=cod_scores,
                   right= df_customer["customer_score"],
                   left_on=cod_scores["select_key"],
                   right_on=df_customer["select_key"],
                   how="left")
cod_scores.drop_duplicates(inplace=True)

# drop the unneccessary columns
cod_scores.drop("key_0", axis=1, inplace=True)

### c_score_6: Corporate Policy Engagement

Here, the scoring is a little more compicated and even more subjective compared to the other value chain engagement scores. 
Again, we follow our perspective that climate resilience is enhanced when businesses and policy makers work together rather than alone. However, as denoted by the think tank InfluenceMap, only few of the influential corporations are positively engaging on climate policy globally, with most holding either a neutral or negative perspective. This makes a coherent scoring more difficult. One the one hand side, we intend to promote purposeful engagement of corporates with policy makers. On the other hand-side, we only perceive those policy engagements as positive where businesses support the view of local policy makers. Unfortunately, the data provided offers this information for the response option *Direct engagment with policy makers*.

To account for this view, we focus on the direct engagement with policy makers and combine the response with the corporate position with policy decisions. Ultimately, we apply the following scoring methodology:

* 1: No
* 2: Trade associations / Funding research organizations / Direct engagement with policy makers & either no corporate position provided or position is opposing/neutral/undecided 
* 3: Direct engagement with policy makers & support with major exceptions
* 4: Direct engagmenet with policy makers & support with minor exceptions
* 5: Direct engagement with policy makers & supportive corporate position

In [None]:
# extract question from response dataset into separate dataframe
df_policy = cor.query("question_number == 'C12.3'")[["account_number", "year", "entity", "response_answer"]]

# split chained response answers
df_policy = split_response(df=df_policy, column="response_answer".lstrip(), sep=";")

# remove whitespaces infront of response options
df_policy["response_answer"] = df_policy["response_answer"].str.lstrip()

# create select key to match information
df_policy["select_key"] =df_policy["year"].astype(str)+"_"+df_policy["account_number"].astype(str)

# create new dataframe for the corporate position
df_position = cor.query("question_number == 'C12.3a' and column_number == 2")[["account_number", "year", "response_answer"]]

# create merge key
df_position["select_key"] =df_position["year"].astype(str)+"_"+df_position["account_number"].astype(str)

# convert response options to columns
df_position = df_position.join(pd.crosstab(df_position["select_key"], df_position["response_answer"]), on=df_position["select_key"])



**Note:**

In the next step, we create a helper column that defines the majority position that a companies takes on policy views. This is necessary given that it is a multi-response column, thus, a single company can have multiple position in a year. This is because each position is assigned to a policy topic (e.g. Energy). For simplification purposes, we take the majority position that a corporate holds in a year. 

In [None]:
# create a helper column with majority position
df_position["majority_position"] = df_position[["Neutral", "Oppose", "Support", "Support with major exceptions", "Support with minor exceptions", "Undecided"]].idxmax(1)

# merge majority position to policy dataframe
df_policy = pd.merge(left=df_policy,
                   right= df_position["majority_position"],
                   left_on=df_policy["select_key"],
                   right_on=df_position["select_key"],
                   how="left")
df_policy.drop_duplicates(inplace=True)

# define the conditions based on which scores are assigned
conditions = [df_policy["response_answer"].eq("Direct engagement with policy makers") & df_policy["majority_position"].eq("Support"), 
              df_policy["response_answer"].eq("Direct engagement with policy makers") & df_policy["majority_position"].eq("Oppose"),
              df_policy["response_answer"].eq("Direct engagement with policy makers") & df_policy["majority_position"].eq("Undecided"),
              df_policy["response_answer"].eq("Direct engagement with policy makers") & df_policy["majority_position"].eq("Neutral"),
              df_policy["response_answer"].eq("Direct engagement with policy makers") & df_policy["majority_position"].eq("NaN"),
              df_policy["response_answer"].eq("Direct engagement with policy makers") & df_policy["majority_position"].eq("Support with minor exceptions"),
              df_policy["response_answer"].eq("Direct engagement with policy makers") & df_policy["majority_position"].eq("Support with major exceptions"),
              df_policy["response_answer"].eq("Funding research organizations") | df_policy["response_answer"].eq("Trade associations"),
              df_policy["response_answer"].eq("Other"),
              df_policy["response_answer"].eq("No")]

choices = [5, 2, 2, 2, 2, 4, 3, 2, 2, 1]

df_policy["policy_score"] = np.select(conditions, choices, default=0)

# choose max score for each entitiy in each year
df_policy['policy_score'] = df_policy.groupby(['account_number', 'year'])['policy_score'].transform(np.max)

# merge final policy score results to collaboration scoring dataframe
cod_scores = pd.merge(left=cod_scores,
                   right= df_policy["policy_score"],
                   left_on=cod_scores["select_key"],
                   right_on=df_policy["select_key"],
                   how="left")
cod_scores.drop_duplicates(inplace=True)

# drop the unneccessary columns
cod_scores.drop("key_0", axis=1, inplace=True)

### Final Collaboration Scoring Table for Corporates

In [None]:
cod_scores.rename(columns={'value_chain_score': 'c_score_3', 'customer_score': 'c_score_4', 'supply_chain_score': 'c_score_5', 'policy_score': 'c_score_6'}, inplace=True)
cod_scores.drop("select_key", axis=1, inplace=True)

In [None]:
cod_scores.to_pickle("data/cod_scores.pkl", protocol=4)