Skip to content

failed reconnect behavior after connectivity outage #996

@robarnes

Description

@robarnes

Context

  • OS : Raspbian 5.15.32-v7+ on PiZero2W
  • Python 3.9.2
  • pip 20.3.4 from /usr/lib/python3/dist-packages/pip (python 3.9)
  • packages installed:
    Package Version

azure-core 1.21.1
azure-iot-device 2.9.0
azure-iot-hub 2.6.0
azure-storage-blob 12.9.0
certifi 2020.6.20
cffi 1.15.0
chardet 4.0.0
colorzero 1.1
cryptography 36.0.0
deprecation 2.1.0
distro 1.5.0
gpiozero 1.6.2
idna 2.10
isodate 0.6.0
janus 0.7.0
msrest 0.6.21
numpy 1.21.4
oauthlib 3.1.1
opencv-python 4.5.4.60
packaging 21.3
paho-mqtt 1.6.1
picamera 1.13
pip 20.3.4
psutil 5.9.0
PyAudio 0.2.11
pycparser 2.21
pyftdi 0.53.3
pynmea2 1.18.0
pyparsing 3.0.6
pyserial 3.5
PySocks 1.7.1
python-apt 2.2.1
pyusb 1.2.1
requests 2.25.1
requests-oauthlib 1.3.0
requests-unixsocket 0.2.0
rpi-ws281x 4.3.1
RPi.GPIO 0.7.0
schedule 1.1.0
scipy 1.7.3
setuptools 52.0.0
SI1145 1.0.2
six 1.16.0
smbus 1.1.post2
smbus2 0.4.1
spidev 3.5
ssh-import-id 5.10
sysv-ipc 1.1.0
typing-extensions 4.0.1
uamqp 1.4.3
urllib3 1.26.5
VL53L0X 1.0.4
wheel 0.34.2

Description of the issue

Azure IOTHub SDK not reconnecting properly after internet/connectivity outage. See final error(s) in log below.

Code sample exhibiting the issue

def iotHubClientInit():            # gets the IotHub Client details using the connection string
    try:
        connectionString = "xxx"  #this comes from Azure
        client = IoTHubDeviceClient.create_from_connection_string(connectionString)
        return client
    except:
        logging.critical("Failed to connect to Azure IOTHub")

def requestSchedule():
    iothubMessageText = '{{"trollyID": "{trollyID}", "schedule_req": 1}}'    
    iothubMessageText_formatted = iothubMessageText.format(trollyID=trollyID)
    message = Message(iothubMessageText_formatted)
    logging.info("Requested schedule from cloud")
    try:
        client.send_message(message)
        return True
    except:
        logging.critical("Failed to request schedule to Azure IOTHub")

def messageFromAzure(message):
    incomingCommands = json.loads(message.data)
    #logging.info(f"Message received: {message.data}")

    if isinstance(incomingCommands, list):
        schedule.clear('cloud-scheduled') #clear existing online scheduled tasks
        trollyState.heartBeat(True) #reset our heartbeats
        for incomingList in incomingCommands:
            if 'freq_type' in incomingList.keys() and 'freq_interval' in incomingList.keys(): #this is a schedule list
                logging.info(f"Incoming schedule: {incomingList}")
                sched(incomingList)
        for job in schedule.get_jobs(): #lets log the jobs nicely
            #logging.info(f"Scheduled job: {job}")
            if job.tags: # show additional detail for cloud scheduled jobs
                logging.info(f"    {job.start_day} at {job.at_time}")
    elif incomingCommands["Type"] == "System" and incomingCommands["Message"] == "stop":
        pause()
        logging.info("Pausing due to stop message from cloud")
   
    else:
        logging.critical(f"Command from Azure I'm not sure what to do with :: {incomingCommands}")

Console log of the issue

	#the request schedule is iot device sending a 'ping' message to iothub
	#the return is iothub responding (after hitting microservice/sql backend)
2022-05-10 20:14:51, INFO - Requested schedule from cloud (line 1212)
2022-05-10 20:14:56, INFO - hearbeat returned in 4.761431694030762 seconds (line 425)
2022-05-10 20:44:56, INFO - Requested schedule from cloud (line 1212)
2022-05-10 20:45:00, INFO - hearbeat returned in 3.891179084777832 seconds (line 425)
2022-05-10 21:15:00, INFO - Requested schedule from cloud (line 1212)
2022-05-10 21:15:05, INFO - hearbeat returned in 4.768422365188599 seconds (line 425)
	#wifi outtage - and when no internet connectivity, the code does not attempt azureHub ping/heartbeats
2022-05-10 21:45:51, WARNING - ReconnectStage: DisconnectEvent received while in unexpected state - DISCONNECTED, Connected: False (line 1281)
2022-05-10 21:45:51, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 21:45:51, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError('Unexpected disconnection') caused by ProtocolClientError('Unknown CONNACK rc==16')\n"] (line 29)
2022-05-10 21:45:51, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 21:45:51, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError('Unexpected disconnection')\n"] (line 29)
2022-05-10 21:45:57, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 21:45:57, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 21:46:14, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 21:46:14, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 21:46:30, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
	#repeat a whole bunch, snipped for brevity
2022-05-10 21:58:25, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 21:58:36, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 21:58:36, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 21:58:46, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 21:58:46, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 21:58:56, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 21:58:56, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 22:04:08, WARNING - ReconnectStage: DisconnectEvent received while in unexpected state - DISCONNECTED, Connected: False (line 1281)
2022-05-10 22:04:08, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 22:04:08, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError('Unexpected disconnection') caused by ProtocolClientError('Unknown CONNACK rc==16')\n"] (line 29)
2022-05-10 22:04:08, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 22:04:08, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError('Unexpected disconnection')\n"] (line 29)
2022-05-10 22:04:21, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 22:04:21, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 22:04:41, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 22:04:41, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 22:05:04, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 22:05:04, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 22:05:26, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
	#repeat a whole bunch, snipped for brevity
2022-05-10 22:12:09, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 22:12:32, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 22:12:32, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
2022-05-10 22:12:52, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 22:12:52, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"] (line 29)
	#wifi back, the internet connected message is python code noting successful ping to nameserver
2022-05-10 22:15:08, INFO - Internet connectivity has been restored (line 1450)
	#back to pinging azure iothub (these don't happen when offline)
2022-05-10 22:15:08, INFO - Requested schedule from cloud (line 1212)
2022-05-10 22:15:12, INFO - hearbeat returned in 4.2701239585876465 seconds (line 425)
	#received message from ipad (seperate device) through iothub askign for telemetry, and this device successfully answers
2022-05-10 22:19:10, INFO - Sending message to Azure: {"xLocation": 0, "yLocation": 40, "zLocation": 1, "batLevel": 86, "CO2": -999, "temperature": -999,"humidity": -999, "UV": -999, "FS":-999, "IR": -999} (line 1185)
	#believe there are two replies as i requested telemetry twice on the ipad
2022-05-10 22:19:20, INFO - Sending message to Azure: {"xLocation": 0, "yLocation": 40, "zLocation": 1, "batLevel": 86, "CO2": -999, "temperature": -999,"humidity": -999, "UV": -999, "FS":-999, "IR": -999} (line 1185)
2022-05-10 22:42:27, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 22:43:29, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError('Unexpected disconnection') caused by ConnectionDroppedError('The connection was lost.')\n"] (line 29)
2022-05-10 22:50:47, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 22:52:51, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1123)')\n"] (line 29)
2022-05-10 23:03:19, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 23:04:41, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1123)')\n"] (line 29)
2022-05-10 23:06:50, INFO - Requested schedule from cloud (line 1212)
2022-05-10 23:08:51, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 23:09:23, ERROR - ["azure.iot.device.common.pipeline.pipeline_exceptions.OperationTimeout: OperationTimeout('Transport timeout on connection operation')\n"] (line 29)
2022-05-10 23:15:10, CRITICAL - Failed to request schedule to Azure IOTHub (line 1217)
2022-05-10 23:15:55, ERROR - Exception caught in background thread.  Unable to handle. (line 28)
2022-05-10 23:16:31, INFO - Requested schedule from cloud (line 1212)
2022-05-10 23:16:42, ERROR - ["azure.iot.device.common.transport_exceptions.ConnectionDroppedError: ConnectionDroppedError('Unexpected disconnection')\n"] (line 29)
2022-05-10 23:34:09, WARNING - ReconnectStage: DisconnectEvent received while in unexpected state - CONNECTING, Connected: False (line 1281)
2022-05-10 23:35:47, ERROR - Exception caught in background thread.  Unable to handle. (line 28)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions