# Stale epel7/8 updates in testing

## Introduction

This document was built to check updates that have been in testing for too long for EPEL7 and EPEL8 branches. 

This was reported in the following ticket https://pagure.io/epel/issue/230

## Scrapping information

First we need to get information regarding this ticket, there are lots of packages, all with their different conditions, so we will need to get the information sorted out in some way to make things simpler.

In the next cell is the list of links that contain the data needed to analyze this:

In [1]:
# Query and filter the updates mentioned in the ticket, querying it directly returns the information in json format
QUERY_URL = "https://bodhi.fedoraproject.org/updates/"

# Information about the projects, like who owns it and the asociated contributors
SRC_URL = "https://src.fedoraproject.org/api/0/rpms/"

# Information about the release status of a project
RELEASE_URL = "https://src.fedoraproject.org/_dg/bodhi_updates/rpms/"

### Cache downloaded stuff

To make things simpler, we will create a cache folder, it can get erased to query the things again if needed

In [2]:
import os

try:
    os.mkdir("cache")
except:
    pass

### Update requests from Bodhi

We need to download the lists of updates that we need to check from bodhi. 

This information will be saved in the `updates` list.

This block also saves the results in the `cache/updates.json` file, remove it if you need to refresh the query, because it will use that as reference instead if it already exists.

In [3]:
import urllib.request
import urllib.parse
import os.path
import json

#TODO: change query to last 6 months? last year?
args = (
    ('submitted_before',2023),
    ('status','pending'),
    ('status','testing'),
    ('releases','EPEL-7'),
    ('releases','EPEL-8'),
)

page = 1
updates = []

if os.path.exists("cache/updates.json"):
    print("Reading from cache")
    # Opening JSON file
    with open('cache/updates.json', 'r') as openfile:
        # Reading from json file
        updates = json.load(openfile)
else:
    while True:
        ARGS = urllib.parse.urlencode(args + (('page', page),) )
        print("Loading page %d (%s)" % (page, QUERY_URL + '?' + ARGS))

        f = urllib.request.urlopen(QUERY_URL + '?' + ARGS)

        content = f.read()
        data = json.loads(content)
        updates = updates + data["updates"]
        del data["updates"]
        print(" -> Got page %d of %d" % (data['page'], data['pages']))
        if data["page"] == data["pages"]:
            break
        page += 1

    with open("cache/updates.json", "w") as fp:
        fp.write(json.dumps(updates, indent=4))

print("Loaded %d updates" % (len(updates)))

Reading from cache
Loaded 144 updates


### List the involved projects

The project names are taken from the update lists parsing the update title. 

*There might be better ways to do this, so a revision might be needed in the future*.


In [4]:
# Project list
import re

def project_name(t):
    sts = t.split("-")
    r = []
    for st in sts:
        if re.match(r'^[\d.]+$', st):
            break
        r.append(st)
    return "-".join(r)

proj_list = set()
for update in updates:
    proj_list.update([project_name(t) for t in update['title'].split()])

proj_list = list(proj_list)
print("%d projects' source info loaded" % (len(proj_list)))


152 projects' source info loaded


### Project information

We need to download the lists of projects that we need to check from bodhi.

This information will be saved in the update_project dictionary.

This block also saves the results in the `cache/projects.json` file, remove it if you need to refresh the query, because it will use that as reference instead if it already exists.

In [5]:
update_projects = {}

if os.path.exists("cache/projects.json"):
    print("Reading from cache")
    # Opening JSON file
    with open('cache/projects.json', 'r') as openfile:
        # Reading from json file
        update_projects = json.load(openfile)    
else:

    for title in proj_list:
        print("Loading %s" % title)
        f = urllib.request.urlopen(SRC_URL + title)

        content = f.read()

        project_info = json.loads(content.decode("utf8"))
        update_projects[title] = project_info
    
    with open("cache/projects.json", "w") as fp:
        fp.write(json.dumps(update_projects, indent=4))

print("%d projects' release info loaded" % (len(update_projects.keys())))

Reading from cache
152 projects' release info loaded


### Downloading release information

Release information is important to see the state of the project.

For example, if it's currently retired from the update branch or from the current release, or if it's a new package for the branch.

In [6]:
update_releases = {}

if os.path.exists("cache/releases.json"):
    print("Reading from cache")
    # Opening JSON file
    with open('cache/releases.json', 'r') as openfile:
        # Reading from json file
        update_releases = json.load(openfile)    
else:

    for title in proj_list:
        print("Loading %s" % title)
        f = urllib.request.urlopen(RELEASE_URL + title)

        content = f.read()

        release_info = json.loads(content.decode("utf8"))
        update_releases[title] = release_info
    
    with open("cache/releases.json", "w") as fp:
        fp.write(json.dumps(update_releases, indent=4))

print("Loaded %d releases" % (len(update_releases.keys())))

Reading from cache
Loaded 152 releases


## Organize data

The following blocks organizes the data for easier access to it

### Add project information to update data

Here we merge the project information into their respective updates. In the current scenario there are no duplicate projects, but that is something that should be taken account for.

In [7]:
updates.sort(key=lambda x: x['date_submitted'])

for i in range(len(updates)):
    proj_titles = [project_name(t) for t in updates[i]['title'].split()]
    projects = []
    releases = []

    for title in proj_titles:
        update_projects[title]["release_info"] = update_releases[title]
        projects.append(update_projects[title])
    updates[i]["projects"] = projects

### Group by user

Just to get the idea of how many users are involved with this updates.

*I don't use it in the overall, but it's a fun number to know about*

In [8]:
users = []
for update in updates:
    name = update['user']['name']
    if name not in users:
        users.append(name)

user_pkgs = [ {
    'name' : user,
    'total' : len([update['title'] for update in updates if update['user']['name'] == user]),
    'updates' : [ update for update in updates if update['user']['name'] == user]
    } for user in users ]

print("Number of users related to submissions: %d" % len(user_pkgs))

Number of users related to submissions: 44


### Filter users with only one update

*Just more fun stats to know about*

In [9]:
have_1 = [x for x in user_pkgs if x['total'] == 1]
have_1.sort(key=lambda x: x['updates'][0]['date_submitted'])

print("Number of users that submitted only one update: %d" % len(have_1))

Number of users that submitted only one update: 33


### Filter orphaned projects

We can easily discard orphaned project's updates, so let's filter them out.

Here we group the updates where all of their subprojects are discarded and the ones that some of their updates are discarded.

I also separated the projects depending on the level of involvement of the user who pushed the update.

In [10]:
orphaned = {
    'all': {
        'orphan': [],
        'contrib': [],
        'rest': [],
    },
    'some': {
        'owner': [],
        'contrib': [],
        'rest': [],
    },
}

active = []

pre_len = len(updates)

for update in updates:
    owner = 0
    contrib = 0
    orphan = 0
    half_orphan = 0

    for project in update['projects']:
        names = list(set(sum(
            [
                project['access_users']['admin'],
                project['access_users']['owner'],
                project['access_users']['commit'],
                project['access_users']['collaborator'],
                project['access_users']['collaborator'],
                [project['user']['name']]], [])))

        if project['user']['name'] == update['user']['name']:
            owner += 1
        elif update['user']['name'] in names:
            contrib += 1

        if project['user']['name'] == 'orphan':
            if len(names) == 1:
                orphan += 1
            else:
                half_orphan += 1


    if orphan + half_orphan > 0:
        if orphan + half_orphan == len(update['projects']):
            if orphan == len(update['projects']):
                orphaned['all']['orphan'].append(update)
            elif contrib > 0:
                orphaned['all']['contrib'].append(update)
            else:
                orphaned['all']['rest'].append(update)
        else:
            if owner > 0:
                orphaned['some']['owner'].append(update)
            elif contrib > 0:
                orphaned['some']['contrib'].append(update)
            else:
                orphaned['some']['rest'].append(update)
    else:
        active.append(update)

updates = active

print("%d orphaned projects, %d projects left" % (pre_len - len(updates), len(updates)))


104 orphaned projects, 40 projects left


In [11]:
print("Updates with orphaned projects:")
print()

proc_sum = 0

for k1 in orphaned:
    if type(orphaned[k1]) is list:
        if len(orphaned[k1]) > 0:
            print("%s: %d" % (k1, len(orphaned[k1])))
            print()
            proc_sum += len(orphaned[k1])
    else:
        k1_sum = 0
        for k2 in orphaned[k1]:
            if len(orphaned[k1][k2]) > 0:
                print("%s %s: %d" % (k1, k2, len(orphaned[k1][k2])))
                proc_sum += len(orphaned[k1][k2])
                k1_sum += len(orphaned[k1][k2])
        if k1_sum > 0:
            print("%s: %d" % (k1, k1_sum))
            print()
            
print("total %d" % (proc_sum))

Updates with orphaned projects:

all orphan: 52
all contrib: 49
all rest: 3
all: 104

total 104


### Filter projects without stable releases

Retired projects are harder to detect, because they can get confused with projects that don't have an actual branch and never got a release.

In [12]:
retired = {
    'some': {
        'rawhide': [],
        'epel': []
    },
    'all': {
        'rawhide': [],
        'epel': []
    },
    'rawhide': []
}

active = []

# TODO: remove hardcoded rawhide branch
rawhide = "F39"

pre_len = len(updates)

for update in updates:
    release_name = update['release']['name']
    rawhide_count = 0
    retired_count = 0
    for project in update['projects']:
        if project['release_info']['updates'][release_name]['stable'] is None:
            retired_count += 1
        if project['release_info']['updates'][rawhide]['stable'] is None:
            rawhide_count += 1

    if retired_count == 0:
        if rawhide_count > 0:
            retired['rawhide'].append(update)
        else:
            active.append(update)
    elif retired_count == len(update['projects']):
        if rawhide_count > 0:
            retired['all']['rawhide'].append(update)
        else:
            retired['all']['epel'].append(update)
    else:
        if rawhide_count > 0:
            retired['some']['rawhide'].append(update)
        else:
            retired['some']['epel'].append(update)

updates = active

print("%d posibly retired projects, %d projects left" % (pre_len - len(updates), len(updates)))


19 posibly retired projects, 21 projects left


In [13]:
print("Projects without stable releases:")
print()

proc_sum = 0
for k1 in retired:
    if type(retired[k1]) is list:
        if len(retired[k1]) > 0:
            print("%s: %d" % (k1, len(retired[k1])))
            print()
            proc_sum += len(retired[k1])
    else:
        k1_sum = 0
        for k2 in retired[k1]:
            if len(retired[k1][k2]) > 0:
                print("%s %s: %d" % (k1, k2, len(retired[k1][k2])))
                proc_sum += len(retired[k1][k2])
                k1_sum += len(retired[k1][k2])
        if k1_sum > 0:
            print("%s: %d" % (k1, k1_sum))
            print()
            
print("total %d" % (proc_sum))

Projects without stable releases:

all rawhide: 1
all epel: 13
all: 14

rawhide: 5

total 19


### Organize by uploader status

Here I organized the projects in the `up_results` dictionary to get a propper view on how the update submitter is involved in the project. I separated it in the following categories.

- owner: Submitter is owner of at least one of the projects in the update
    - all: the submitter is owner of all the projects
    - contrib: the submitter contributor to some of the projects
    - rest: submitter is not directly related to some of the project (maybe group?)
- contrib: Submitter is not owner of any, but is at least contributor to one of them
    - contrib: the submitter contributes to all the projects
    - rest: owner to at least one project, but it's relation to the other projects is not confirmed
- group: The submitter is not in the list of users related to the project, but there are SIGs that have permissions to maintain the project.

The rest where submitted by people not in the list of users related to the project, so they might have just left the project or are posibly related to a group.

In [14]:
up_results = {
    'owner': {
        'all': [],
        'contrib': [],
        'rest': [],
    },
    'contrib': {
        'contrib': [],
        'rest': []
    },
    'group': []
}

unrelated = []

for update in updates:
    owner = 0
    contrib = 0
    c_groups = 0

    for project in update['projects']:
        names = list(set(sum(
            [
                project['access_users']['admin'],
                project['access_users']['owner'],
                project['access_users']['commit'],
                project['access_users']['collaborator'],
                project['access_users']['collaborator'],
                [project['user']['name']]], [])))

        groups = project['access_groups']
        groups = list(set(groups['admin'] + groups['commit']))
        if project['user']['name'] == update['user']['name']:
            owner += 1
        elif update['user']['name'] in names:
            contrib += 1
            
        if len(groups) > 0:
            c_groups += 1

    if owner > 0:
        if owner == len(update['projects']):
            up_results['owner']['all'].append(update)
        elif owner + contrib == len(update['projects']):
            up_results['owner']['contrib'].append(update)
        else:
            up_results['owner']['rest'].append(update)

    elif contrib > 0:
        if contrib  == len(update['projects']):
            up_results['owner']['contrib'].append(update)
        else:
            up_results['owner']['rest'].append(update)
    else:
        if c_groups > 0:
            up_results['group'].append(update)
        else:
            unrelated.append(update)

print("%d projects left" % (len(updates) - len(unrelated)))

16 projects left


In [15]:
print("Updates where the submitter is currently unrelated to the project")
print()
if len(unrelated) > 0:
    print("unrelated: %d" % (len(unrelated)))
    print()

print("Updates that need to get notified directly")
print()    
    
proc_sum = 0
for k1 in up_results:
    if type(up_results[k1]) is list:
        if len(up_results[k1]) > 0:
            print("%s: %d" % (k1, len(up_results[k1])))
            print()
            proc_sum += len(up_results[k1])
    else:
        k1_sum = 0
        for k2 in up_results[k1]:
            if len(up_results[k1][k2]) > 0:
                print("%s %s: %d" % (k1, k2, len(up_results[k1][k2])))
                proc_sum += len(up_results[k1][k2])
                k1_sum += len(up_results[k1][k2])
        if k1_sum > 0:
            print("%s: %d" % (k1, k1_sum))
            print()
            
print("total: %d" % (proc_sum))


Updates where the submitter is currently unrelated to the project

unrelated: 5

Updates that need to get notified directly

owner all: 10
owner contrib: 5
owner: 15

group: 1

total: 16


## Resolution

Here is the list of projects that needs to get dealt with

This aux function is to output the project info neatily

In [16]:
def print_update(update):
    print("- title: %s" % update['title'])
    print("  url: %s" % update['url'])
    print("  user: %s" % update['user']['name'])
    print("  status: %s" % update['status'])
    print("  karma: %s" % update['karma'])
    print("  date_submitted: %s" % update['date_submitted'])
    for project in update['projects']:
        print("  - name: %s" % project['name'])
        print("    url: %s" % project['full_url'])
        print("    owner: %s" % project['user']['name'])
        users = list(
            {
                user for sublist in [
                    project['access_users'][x] for x in project['access_users'] if (
                        x in ["admin", "commit", "collaborator", "owner"]
                    )
                ] for user in sublist
            })
        groups = list(
            {
                user for sublist in [
                    project['access_groups'][x] for x in project['access_users'] if (
                        x in ["admin", "commit", "collaborator"]
                    )
                ] for user in sublist
            })
        print("    users: %s" % users)
        print("    groups: %s" % groups)
        print()

def print_update_links(update):
    print("%s" % update['url'])
    for project in update['projects']:
        print("- %s" % project['full_url'])

def print_update_link_short(update):
    print("%s" % update['url'])


### Retired projects

These might need to be dealt with case by case to see if they are actually retired

In [17]:
for k1 in retired:
    print(k1)
    if type(retired[k1]) is list:
        [print_update_links(x) for x in retired[k1]]
    else:
        for k2 in retired[k1]:
            print(k1, k2)
            [print_update_links(x) for x in retired[k1][k2]]

some
some rawhide
some epel
all
all rawhide
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1fd3c00778
- https://src.fedoraproject.org/rpms/cherokee
all epel
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4a61145af1
- https://src.fedoraproject.org/rpms/orafce
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-cd85b86bb1
- https://src.fedoraproject.org/rpms/lz4
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-3e26ebd428
- https://src.fedoraproject.org/rpms/libgringotts
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-84f0138f7d
- https://src.fedoraproject.org/rpms/pidgin-sipe
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-a298c388e0
- https://src.fedoraproject.org/rpms/psad
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-bee0db88c2
- https://src.fedoraproject.org/rpms/pgmodeler
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2017-631efd272e
- https://src.fedoraproject.org/rpms/python-XStatic-Patternfly-Bootstrap-Treeview
ht

### Orphaned

These can get discarded since there is no one who is taking the lead in actually maintaining the package

In [18]:
for k1 in orphaned:
    print(k1)
    if type(orphaned[k1]) is list:
        [print_update_link_short(x) for x in orphaned[k1]]
    else:
        for k2 in orphaned[k1]:
            print(k1, k2)
            [print_update_link_short(x) for x in orphaned[k1][k2]]

all
all orphan
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-a5199f34b3
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-2f4a8e068f
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-74e296ce58
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-7a50412263
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-a12622f56c
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1d8ebe1be3
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-b9bb950694
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-14d64385a6
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-37c0b14a78
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-0b1c8b8943
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-658c2c6df3
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-0ed34e82ee
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-786a55da7b
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-8946616811
https://bodhi.fedoraproject.org/u